Leveraging Pre-trained Language Model Checkpoints for Encoder-Decoder Models
📰 Hugging Face Blog
Leverage pre-trained language model checkpoints for encoder-decoder models to improve performance
Action Steps
- Load pre-trained language models like BERT and GPT2
- Use the pre-trained models as encoders or decoders in an encoder-decoder architecture
- Fine-tune the model on a specific task or dataset
- Experiment with different weight sharing strategies between the encoder and decoder
Who Needs to Know This
NLP engineers and researchers can benefit from this technique to improve their model's performance, and software engineers can implement this in their projects
Key Insight
💡 Using pre-trained language models as checkpoints can significantly improve the performance of encoder-decoder models
Share This
💡 Improve your encoder-decoder model's performance by leveraging pre-trained language model checkpoints!
Key Takeaways
Leverage pre-trained language model checkpoints for encoder-decoder models to improve performance
Full Article
Published Time: 2020-11-09T00:00:00.007Z
# Leveraging Pre-trained Language Model Checkpoints for Encoder-Decoder Models
[Hugging Face](https://huggingface.co/)
* [Models](https://huggingface.co/models)
* [Datasets](https://huggingface.co/datasets)
* [Spaces](https://huggingface.co/spaces)
* [Buckets new](https://huggingface.co/storage)
* [Docs](https://huggingface.co/docs)
* [Enterprise](https://huggingface.co/enterprise)
* [Pricing](https://huggingface.co/pricing)
*
*
* * *
* [Log In](https://huggingface.co/login)
* [Sign Up](https://huggingface.co/join)
[Back to Articles](https://huggingface.co/blog)
# [](https://huggingface.co/blog/warm-starting-encoder-decoder#leveraging-pre-trained-language-model-checkpoints-for-encoder-decoder-models) Leveraging Pre-trained Language Model Checkpoints for Encoder-Decoder Models
Published November 9, 2020
[Update on GitHub](https://github.com/huggingface/blog/blob/main/warm-starting-encoder-decoder.md)
[- [x] Upvote 16](https://huggingface.co/login?next=%2Fblog%2Fwarm-starting-encoder-decoder)
* [](https://huggingface.co/gabrielmotablima "gabrielmotablima")
* [](https://huggingface.co/muneebdev "muneebdev")
* [](https://huggingface.co/rudranilIITK "rudranilIITK")
* [](https://huggingface.co/Biohebb "Biohebb")
* [](https://huggingface.co/DaliaO15 "DaliaO15")
* [](https://huggingface.co/matlok "matlok")
* +10
[](https://huggingface.co/patrickvonplaten)
[Patrick von Platen patrickvonplaten Follow](https://huggingface.co/patrickvonplaten)
[](https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/Leveraging_Pre_trained_Checkpoints_for_Encoder_Decoder_Models.ipynb)
* [**Introduction**](https://huggingface.co/blog/warm-starting-encoder-decoder#introduction "Introduction")
* [**BERT**](https://huggingface.co/blog/warm-starting-encoder-decoder#bert "BERT")
* [**GPT2**](https://huggingface.co/blog/warm-starting-encoder-decoder#gpt2 "GPT2")
* [**Encoder-Decoder**](https://huggingface.co/blog/warm-starting-encoder-decoder#encoder-decoder "Encoder-Decoder")
* [**Warm-starting encoder-decoder models (Theory)**](https://huggingface.co/blog/warm-starting-encoder-decoder#warm-starting-encoder-decoder-models-theory "Warm-starting encoder-decoder models (Theory)")
* [**Recap Encoder-Decoder Model**](https://huggingface.co/blog/warm-starting-encoder-decoder#recap-encoder-decoder-model "Recap Encoder-Decoder Model")
* [**Warm-starting Encoder-Decoder with BERT**](https://huggingface.co/blog/warm-starting-encoder-decoder#warm-starting-encoder-decoder-with-bert "Warm-starting Encoder-Decoder with BERT")
* [**Warm-starting Encoder-Decoder with BERT and GPT2**](https://huggingface.co/blog/warm-starting-encoder-decoder#warm-starting-encoder-decoder-with-bert-and-gpt2 "Warm-starting Encoder-Decoder with BERT and GPT2")
* [**Encoder-Decoder Weight Sharing**](https://huggingface.co/blog/warm-starting-encoder-decoder#encoder-decoder-weight-sharing "Encoder-Decoder Weight Sharing")
* [**Warm-starting encoder-decoder models (Analys
# Leveraging Pre-trained Language Model Checkpoints for Encoder-Decoder Models
[Hugging Face](https://huggingface.co/)
* [Models](https://huggingface.co/models)
* [Datasets](https://huggingface.co/datasets)
* [Spaces](https://huggingface.co/spaces)
* [Buckets new](https://huggingface.co/storage)
* [Docs](https://huggingface.co/docs)
* [Enterprise](https://huggingface.co/enterprise)
* [Pricing](https://huggingface.co/pricing)
*
*
* * *
* [Log In](https://huggingface.co/login)
* [Sign Up](https://huggingface.co/join)
[Back to Articles](https://huggingface.co/blog)
# [](https://huggingface.co/blog/warm-starting-encoder-decoder#leveraging-pre-trained-language-model-checkpoints-for-encoder-decoder-models) Leveraging Pre-trained Language Model Checkpoints for Encoder-Decoder Models
Published November 9, 2020
[Update on GitHub](https://github.com/huggingface/blog/blob/main/warm-starting-encoder-decoder.md)
[- [x] Upvote 16](https://huggingface.co/login?next=%2Fblog%2Fwarm-starting-encoder-decoder)
* [](https://huggingface.co/gabrielmotablima "gabrielmotablima")
* [](https://huggingface.co/muneebdev "muneebdev")
* [](https://huggingface.co/rudranilIITK "rudranilIITK")
* [](https://huggingface.co/Biohebb "Biohebb")
* [](https://huggingface.co/DaliaO15 "DaliaO15")
* [](https://huggingface.co/matlok "matlok")
* +10
[](https://huggingface.co/patrickvonplaten)
[Patrick von Platen patrickvonplaten Follow](https://huggingface.co/patrickvonplaten)
[](https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/Leveraging_Pre_trained_Checkpoints_for_Encoder_Decoder_Models.ipynb)
* [**Introduction**](https://huggingface.co/blog/warm-starting-encoder-decoder#introduction "Introduction")
* [**BERT**](https://huggingface.co/blog/warm-starting-encoder-decoder#bert "BERT")
* [**GPT2**](https://huggingface.co/blog/warm-starting-encoder-decoder#gpt2 "GPT2")
* [**Encoder-Decoder**](https://huggingface.co/blog/warm-starting-encoder-decoder#encoder-decoder "Encoder-Decoder")
* [**Warm-starting encoder-decoder models (Theory)**](https://huggingface.co/blog/warm-starting-encoder-decoder#warm-starting-encoder-decoder-models-theory "Warm-starting encoder-decoder models (Theory)")
* [**Recap Encoder-Decoder Model**](https://huggingface.co/blog/warm-starting-encoder-decoder#recap-encoder-decoder-model "Recap Encoder-Decoder Model")
* [**Warm-starting Encoder-Decoder with BERT**](https://huggingface.co/blog/warm-starting-encoder-decoder#warm-starting-encoder-decoder-with-bert "Warm-starting Encoder-Decoder with BERT")
* [**Warm-starting Encoder-Decoder with BERT and GPT2**](https://huggingface.co/blog/warm-starting-encoder-decoder#warm-starting-encoder-decoder-with-bert-and-gpt2 "Warm-starting Encoder-Decoder with BERT and GPT2")
* [**Encoder-Decoder Weight Sharing**](https://huggingface.co/blog/warm-starting-encoder-decoder#encoder-decoder-weight-sharing "Encoder-Decoder Weight Sharing")
* [**Warm-starting encoder-decoder models (Analys
DeepCamp AI