A curated list for Efficient Large Language Models
-
Updated
Jun 17, 2025 - Python
A curated list for Efficient Large Language Models
Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.
[ICML24] Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs
D^2-MoE: Delta Decompression for MoE-based LLMs Compression
[ICLR 2024] Jaiswal, A., Gan, Z., Du, X., Zhang, B., Wang, Z., & Yang, Y. Compressing llms: The truth is rarely pure and never simple.
LLM Inference on AWS Lambda
papers of llm compression
[CAAI AIR'24] Minimize Quantization Output Error with Bias Compensation
Interpretation code for analyzing LLMs compression effects for the paper "When Reasoning Meets Compression: Understanding the Effects of LLMs Compression on Large Reasoning Models"
A standard PyTorch implementation of Google’s paper Language Modeling Is Compression—with no reliance on Haiku or JAX. Drawing on the original repository (https://github.com/google-deepmind/language_modeling_is_compression), this code is capable of reproducing the key results from the paper.
Token Price Estimation for LLMs
NYCU Edge AI Final Project Using SGLang
Research code for LLM Compression using Functional Algorithms, exploring stratified manifold learning, clustering, and compression techniques. Experiments span synthetic datasets (Swiss Roll, Manifold Singularities) and real-world text embeddings (DBpedia-14). The goal is to preserve semantic structure while reducing model complexity.
Add a description, image, and links to the llm-compression topic page so that developers can more easily learn about it.
To associate your repository with the llm-compression topic, visit your repo's landing page and select "manage topics."