$\lambda$-GELU: Learning Gating Hardness for Controlled ReLU-ization in Deep Networks
📰 ArXiv cs.AI
arXiv:2603.21991v2 Announce Type: replace-cross Abstract: Gaussian Error Linear Unit (GELU) is a widely used smooth alternative to Rectifier Linear Unit (ReLU), yet many deployment, compression, and analysis toolchains are most naturally expressed for piecewise-linear (ReLU-type) networks. We study a hardness-parameterized formulation of GELU, f(x;{\lambda})=x{\Phi}({\lambda} x), where {\Phi} is the Gaussian CDF and {\lambda} \in [1, infty) controls gate sharpness, with the goal of turning smoot
DeepCamp AI