LOOPE: Learnable Optimal Patch Order in Positional Embeddings for Vision Transformers
📰 ArXiv cs.AI
arXiv:2504.14386v2 Announce Type: replace-cross Abstract: Positional embeddings (PE) play a crucial role in Vision Transformers (ViTs) by providing spatial information otherwise lost due to the permutation invariant nature of self attention. While absolute positional embeddings (APE) have shown theoretical advantages over relative positional embeddings (RPE), particularly due to the ability of sinusoidal functions to preserve spatial inductive biases like monotonicity and shift invariance, a fun
DeepCamp AI