Weierstrass Positional Encoding for Vision Transformers
📰 ArXiv cs.AI
arXiv:2605.23719v1 Announce Type: cross Abstract: Vision Transformers have achieved remarkable success in computer vision, but their common use of learnable one-dimensional positional encodings weakens the inherent two-dimensional spatial structure of images after patch flattening. Existing positional encodings often lack geometric constraints and do not preserve a monotonic relationship between Euclidean spatial distances and sequential index distances, limiting ViTs' ability to exploit spatial
DeepCamp AI