inference-server

Here are 48 public repositories matching this topic...

containers / ramalama

RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.

ai containers cuda intel hip inference-server podman llm llamacpp vllm

Updated Sep 10, 2025
Python

roboflow / inference

Star

Turn any computer or edge device into a command center for your computer vision projects.

Updated Sep 10, 2025
Python

basetenlabs / truss

Star

The simplest way to serve AI/ML models in production

open-source machine-learning packaging artificial-intelligence falcon easy-to-use whisper inference-server model-serving inference-api stable-diffusion wizardlm

Updated Sep 10, 2025
Python

pipeless-ai / pipeless

Star

An open-source computer vision framework to build and deploy apps in minutes

Updated May 8, 2024
Rust

underneathall / pinferencia

Star

Python + Inference - Model Deployment library in Python. Simplest model inference server ever.

Updated Feb 14, 2023
Python

NVIDIA / gpu-rest-engine

Star

A REST API for Caffe using Docker and Go

docker caffe deep-learning gpu inference inference-server

Updated Jul 20, 2018
C++

Michael-A-Kuykendall / shimmy

Star

⚡Local-first AI inference server with OpenAI API compatibility, auto-discovery, hot model swapping, and tool calling. Single-binary Rust solution for GGUF models with LoRA support. FREE now, FREE forever.

rust machine-learning transformers api-server developer-tools llama command-line-tool lora inference-server rust-crate huggingface huggingface-transformers huggingface-models llamacpp llm-inference local-ai gguf ollama-api openai-compatible

Updated Sep 10, 2025
Rust

BMW-InnovationLab / BMW-YOLOv4-Inference-API-GPU

Star

This is a repository for an nocode object detection inference API using the Yolov3 and Yolov4 Darknet framework.

Updated Jun 28, 2022
Python

containers / podman-desktop-extension-ai-lab

Star

Work with LLMs on a local environment using containers

ai local containers inference-server podman llms

Updated Sep 10, 2025
TypeScript

BMW-InnovationLab / BMW-YOLOv4-Inference-API-CPU

Star

This is a repository for an nocode object detection inference API using the Yolov4 and Yolov3 Opencv.

Updated Jun 28, 2022
Python

BMW-InnovationLab / BMW-TensorFlow-Inference-API-CPU

Star

This is a repository for an object detection inference API using the Tensorflow framework.

Updated Jun 28, 2022
Python

kibae / onnxruntime-server

Sponsor

Star

ONNX Runtime Server: The ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference.

machine-learning ai deep-learning cuda inference-server nueral-networks contributions-welcome onnx onnxruntime

Updated Aug 13, 2025
C++

autodeployai / ai-serving

Star

Serving AI/ML models in the open standard formats PMML and ONNX with both HTTP (REST API) and gRPC endpoints

inference pmml inference-server onnx onnx-models ai-serving pmml-model onnx-inference onnx-rest pmml-deployment pmml-rest pmml-grpc onnx-grpc pmml-realtime onnx-realtime pmml-inference

Updated Oct 20, 2024
Scala

vertexclique / orkhon

Sponsor

Star

Orkhon: ML Inference Framework and Server Runtime

machine-learning async tensorflow multiprocessing python3 inference-server data-parallelism

Updated Feb 1, 2021
Rust

kf5i / k3ai

Star

K3ai is a lightweight, fully automated, AI infrastructure-in-a-box solution that allows anyone to experiment quickly with Kubeflow pipelines. K3ai is perfect for anything from Edge to laptops.

kubernetes artificial-intelligence edge datascience machinelearning inference-server kubeflow kubeflow-pipelines k3s

Updated Nov 2, 2021
PowerShell

notAI-tech / fastDeploy

Star

Deploy DL/ ML inference pipelines with minimal extra code.

Updated Nov 20, 2024
Python

RubixML / Server

Star

A standalone inference server for trained Rubix ML estimators.

api infrastructure php machine-learning microservice json-api rest-api inference http-server inference-server inference-engine model-deployment php-ml ml-infrastructure model-server rubix-ml php-machine-learning rubix-server

Updated Mar 28, 2025
PHP

friendliai / friendli-client

Star

[⛔️ DEPRECATED] Friendli: the fastest serving engine for generative AI

ai ml inference gpt inference-server mistral inference-engine serving mlops gpt3 llm stable-diffusion llms generative-ai llmops llm-serving llm-inference llama2 llm-ops

Updated Jun 25, 2025
Python

curtisgray / wingman

Star

Wingman is the fastest and easiest way to run Llama models on your PC or Mac.

windows macos linux downloader ai local download gpu chatbot inference openai gpu-acceleration llama inference-server inference-engine gpu-monitoring llm chatgpt llamacpp

Updated Jun 2, 2024
TypeScript

k9ele7en / Triton-TensorRT-Inference-CRAFT-pytorch

Star

Advanced inference pipeline using NVIDIA Triton Inference Server for CRAFT Text detection (Pytorch), included converter from Pytorch -> ONNX -> TensorRT, Inference pipelines (TensorRT, Triton server - multi-format). Supported model format for Triton inference: TensorRT engine, Torchscript, ONNX

inference pytorch text-detection nvidia-docker inference-server tensorrt inference-engine onnx onnx-torch tensorrt-conversion triton-inference-server text-detection-from-image

Updated Aug 18, 2021
Python

Improve this page

Add a description, image, and links to the inference-server topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the inference-server topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inference-server

Here are 48 public repositories matching this topic...

containers / ramalama

roboflow / inference

basetenlabs / truss

pipeless-ai / pipeless

underneathall / pinferencia

NVIDIA / gpu-rest-engine

Michael-A-Kuykendall / shimmy

BMW-InnovationLab / BMW-YOLOv4-Inference-API-GPU

containers / podman-desktop-extension-ai-lab

BMW-InnovationLab / BMW-YOLOv4-Inference-API-CPU

BMW-InnovationLab / BMW-TensorFlow-Inference-API-CPU

kibae / onnxruntime-server

autodeployai / ai-serving

vertexclique / orkhon

kf5i / k3ai

notAI-tech / fastDeploy

RubixML / Server

friendliai / friendli-client

curtisgray / wingman

k9ele7en / Triton-TensorRT-Inference-CRAFT-pytorch

Improve this page

Add this topic to your repo