Skip to content
#

inference-server

Here are 48 public repositories matching this topic...

⚡Local-first AI inference server with OpenAI API compatibility, auto-discovery, hot model swapping, and tool calling. Single-binary Rust solution for GGUF models with LoRA support. FREE now, FREE forever.

  • Updated Sep 10, 2025
  • Rust

Advanced inference pipeline using NVIDIA Triton Inference Server for CRAFT Text detection (Pytorch), included converter from Pytorch -> ONNX -> TensorRT, Inference pipelines (TensorRT, Triton server - multi-format). Supported model format for Triton inference: TensorRT engine, Torchscript, ONNX

  • Updated Aug 18, 2021
  • Python

Improve this page

Add a description, image, and links to the inference-server topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the inference-server topic, visit your repo's landing page and select "manage topics."

Learn more