Rethinking Key-Value Cache Compression Techniques for Large Language Model Serving

Published:

Recommended citation: Wei Gao*, Xinyu Zhou* (Co-first Authors), Peng Sun, Yonggang Wen, Tianwei Zhang; Annual Conference on Machine Learning and Systems (MLSys), May 2025. [Top Conference in MLSys Area]