A Triadic Suffix Tokenization Scheme for Numerical Reasoning

📰 ArXiv cs.AI

arXiv:2604.11582v1 Announce Type: cross Abstract: Standard subword tokenization methods fragment numbers inconsistently, causing large language models (LLMs) to lose positional and decimal structure - a primary driver of errors in arithmetic and scientific reasoning. We introduce Triadic Suffix Tokenization (TST), a deterministic scheme that partitions digits into three-digit triads and annotates each triad with an explicit magnitude marker. Critically, the scheme defines a fixed, one-to-one map

Published 14 Apr 2026
Read full paper → ← Back to Reads