awq

Here are 11 public repositories matching this topic...

intel / neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

sparsity pruning quantization knowledge-distillation auto-tuning int8 low-precision quantization-aware-training post-training-quantization awq int4 large-language-models gptq smoothquant sparsegpt fp4 mxformat

Updated Sep 10, 2025
Python

ModelTC / LightCompress

Star

A powerful toolkit for compressing large models including LLM, VLM, and video generation models.

benchmark deployment tool evaluation pruning quantization wan awq large-language-models llm token-pruning vllm smoothquant token-reduction mixtral internlm2 token-merging deepseek-v3

Updated Aug 22, 2025
Python

hcd233 / Aris-AI-Model-Server

Star

An OpenAI Compatible API which integrates LLM, Embedding and Reranker. 一个集成 LLM、Embedding 和 Reranker 的 OpenAI 兼容 API

ai embedding mlx reranker rag fastapi sentence-transformers awq llm vllm gptq openai-compatible-api

Updated Aug 21, 2025
Python

harleyszhang / harleyszhang.github.io

Star

🧗‍♂️ harleyszhang 的个人博客

blog awq llm llm-inference

Updated Sep 1, 2025
HTML

GURPREETKAURJETHRA / Quantize-LLM-using-AWQ

Star

Quantize LLM using AWQ

quantize awq large-language-models llms generative-ai llm-training

Updated Apr 26, 2024
Jupyter Notebook

lpalbou / model-quantizer

Star

Effortlessly quantize, benchmark, and publish Hugging Face models with cross-platform support for CPU/GPU. Reduce model size by 75% while maintaining performance.

python nlp machine-learning cross-platform optimization transformers inference pytorch quantization model-compression huggingface awq llm gptq bitsandbytes cpu-compatible

Updated Mar 15, 2025
Python

FireStrike1010 / artificial_personality

Star

Artificial Personality is text2text AI chatbot that can use character cards

ai chatbot transformers neural-networks chatbot-framework awq tavernai

Updated May 28, 2024
Python

ai-art-dev99 / vLLM-efficient-serving-stack

Star

Production-grade vLLM serving with an OpenAI-compatible API, per-request LoRA routing, KEDA autoscaling on Prometheus metrics, Grafana/OTel observability, and a benchmark comparing AWQ vs GPTQ vs GGUF.

grafana openai-api keda-scalers awq large-language-models vllm low-rank-adaptation vllm-serve

Updated Aug 30, 2025
Python

RajVenkat20 / LLM-Optimizations-QLoRA-AWQ

Star

This project takes the Flan-T5 LLM and applies QLoRA and AWQ quantization techniques

python3 kaggle huggingface-transformers awq flan-t5 llm-inference qlora

Updated Dec 13, 2024
Python

glurp / rfilter

Star

programmable filter, as posix awq, with ruby syntaxe and embeddable function

ruby bash filter plotting awq

Updated Apr 25, 2022
Ruby

vpgits / sdgp-ml

Star

This repository contains notebooks and resources related to the Software Development Group Project (SDGP) machine learning component. Specifically, it includes two notebooks used for creating a dataset and fine-tuning a Mistral-7B-v0.1-Instruct model.

machine-learning transformers pytorch peft awq qlora autoawq

Updated Mar 21, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the awq topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the awq topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

awq

Here are 11 public repositories matching this topic...

intel / neural-compressor

ModelTC / LightCompress

hcd233 / Aris-AI-Model-Server

harleyszhang / harleyszhang.github.io

GURPREETKAURJETHRA / Quantize-LLM-using-AWQ

lpalbou / model-quantizer

FireStrike1010 / artificial_personality

ai-art-dev99 / vLLM-efficient-serving-stack

RajVenkat20 / LLM-Optimizations-QLoRA-AWQ

glurp / rfilter

vpgits / sdgp-ml

Improve this page

Add this topic to your repo