Modular Representation Compression: Adapting LLMs for Efficient and Effective Recommendations

📰 ArXiv cs.AI

arXiv:2604.18146v2 Announce Type: cross Abstract: Recently, large language models (LLMs) have advanced recommendation systems (RSs), and recent works have begun to explore how to integrate LLMs into industrial RSs. While most approaches deploy LLMs offline to generate and pre-cache augmented representations for RSs, high-dimensional representations from LLMs introduce substantial storage and computational costs. Thus, it is crucial to compress LLM representations effectively. However, we identif

Published 21 Apr 2026
Read full paper → ← Back to Reads