A Triadic Suffix Tokenization Scheme for Numerical Reasoning
📰 ArXiv cs.AI
arXiv:2604.11582v1 Announce Type: cross Abstract: Standard subword tokenization methods fragment numbers inconsistently, causing large language models (LLMs) to lose positional and decimal structure - a primary driver of errors in arithmetic and scientific reasoning. We introduce Triadic Suffix Tokenization (TST), a deterministic scheme that partitions digits into three-digit triads and annotates each triad with an explicit magnitude marker. Critically, the scheme defines a fixed, one-to-one map
DeepCamp AI