Deep double descent

📰 OpenAI News

The double descent phenomenon occurs in various neural network models, where performance improves, worsens, and improves again with increasing model size, data size, or training time

advanced Published 5 Dec 2019

Action Steps

Experiment with different model sizes to observe the double descent phenomenon
Analyze the effect of increasing data size on model performance
Investigate the impact of training time on the double descent phenomenon
Apply regularization techniques to mitigate the negative effects of double descent

Who Needs to Know This

Machine learning researchers and engineers on a team can benefit from understanding this phenomenon to improve model performance, while data scientists and ai-engineers can apply this knowledge to optimize their models

Key Insight

💡 The double descent phenomenon is a universal behavior in neural networks that can be mitigated with careful regularization

Key Takeaways

The double descent phenomenon occurs in various neural network models, where performance improves, worsens, and improves again with increasing model size, data size, or training time

Full Article

We show that the double descent phenomenon occurs in CNNs, ResNets, and transformers: performance first improves, then gets worse, and then improves again with increasing model size, data size, or training time. This effect is often avoided through careful regularization. While this behavior appears to be fairly universal, we don’t yet fully understand why it happens, and view further study of this phenomenon as an important research direction.

Read full article → ← Back to Reads