TRINE: A Token-Aware, Runtime-Adaptive FPGA Inference Engine for Multimodal AI

📰 ArXiv cs.AI

arXiv:2603.22867v1 Announce Type: cross Abstract: Multimodal stacks that mix ViTs, CNNs, GNNs, and transformer NLP strain embedded platforms because their compute/memory patterns diverge and hard real-time targets leave little slack. TRINE is a single-bitstream FPGA accelerator and compiler that executes end-to-end multimodal inference without reconfiguration. Layers are unified as DDMM/SDDMM/SpMM and mapped to a mode-switchable engine that toggles at runtime among weight/output-stationary systo

Published 1 Jun 2026

Read full paper → ← Back to Reads