On the Mathematical Relationship Between Layer Normalization and Dynamic Activation Functions
📰 ArXiv cs.AI
arXiv:2503.21708v4 Announce Type: replace-cross Abstract: Layer normalization (LN) is an essential component of modern neural networks. While many alternative techniques have been proposed, none of them have succeeded in replacing LN so far. The latest suggestion in this line of research is a dynamic activation function called Dynamic Tanh (DyT). Although it is empirically well-motivated and appealing from a practical point of view, it lacks a theoretical foundation. In this work, we shed light
DeepCamp AI