FMplex: Model Virtualization for Serving Extensible Foundation Models
📰 ArXiv cs.AI
arXiv:2606.09643v1 Announce Type: cross Abstract: Foundation models (FMs) are increasingly used as backbones for downstream tasks across language, vision, time-series, and multimodal applications. Yet existing model-serving systems deploy each customized task as an independent model instance, thereby replicating heavyweight backbones, wasting accelerator memory, and losing opportunities to amortize batching and loading costs. This paper presents FMplex, a serving system that treats FM backbones
DeepCamp AI