🤖 Local AI Chat Agent using Microsoft Agent Framework integrated with Docker Model Runner (DMR) local model inference.
A lightweight example demonstrating how to run a local AI chat agent powered by Agent Framework and OpenAI’s Python SDK — fully integrated with Docker Model Runner (DMR) for offline LLM inference.
This project uses a clean, modular design:
- Configurations stored in a
.env
file - Centralized logging via
logger_config.py
- Async and streaming chat responses
- Local inference (no cloud API required)
This example shows how to:
- Connect to a locally hosted LLM (via Docker Model Runner)
- Build an agent with specific behavioral instructions
- Execute both non-streaming and streaming chat interactions
- Manage configuration and logs in a clean, reusable way
📦 local-ai-agent/
├── main.py # Main script (agent setup + interactions)
├── logger_config.py # Centralized logging configuration
├── .env # Environment configuration (model, URLs, retries, etc.)
├── requirements.txt # Python dependencies
└── logs/ # Directory where logs are saved
- Python 3.11
- Docker Desktop 4.47.0 (with Model Runner enabled)
- The following Python packages:
pip install -r requirements.txt
requirements.txt
agent-framework
openai
python-dotenv
All app settings are stored in .env
for easier customization.
# Docker Model Runner Settings
DMR_BASE_URL=http://localhost:12434/engines/llama.cpp/v1
MODEL_ID=ai/smollm2:latest
# Agent Instructions
AGENT_INSTRUCTIONS=You are good at telling short, simple, and funny jokes.
# Retry Settings
MAX_RETRIES=3
RETRY_DELAY=3
# Logging Settings
LOG_DIR=logs
LOG_FILE=agent.log
You can change the model, behavior, or retry settings here without touching the code.
Before running the agent, ensure your Docker Model Runner (DMR) is active and a model is available.
# 1️⃣ Enable Model Runner service
docker desktop enable model-runner --tcp 12434
# 2️⃣ Pull a local model (example: smollm2)
docker model pull ai/smollm2:latest
Start the app:
python main.py
Expected Output Example:
Agent created and running with a local model via DMR.
Non-Streaming Invocation:
Why don’t pirates take baths? Because they just wash up on shore!
Streaming Invocation:
Why don’t pirates take baths? Because they just wash up on shore!
Logs will also be saved in:
logs/agent.log
from dotenv import load_dotenv
load_dotenv()
local_client = AsyncOpenAI(
api_key="dummy_key",
base_url=os.getenv("DMR_BASE_URL")
)
chat_client = OpenAIChatClient(
async_client=local_client,
model_id=os.getenv("MODEL_ID")
)
async with ChatAgent(
chat_client=chat_client,
instructions=os.getenv("AGENT_INSTRUCTIONS")
) as agent:
result = await agent.run("Tell me a joke about a pirate.")
All events are logged to both console and logs/agent.log
via logger_config.py
.
Feature | Description |
---|---|
🧠 Agent Framework | Easy behavioral control with instructions |
🐳 Local Inference | Runs models locally through Docker |
⚡ Async + Streaming | Real-time response streaming support |
🧩 Modular design | .env for config, logger_config.py for logging |
🧾 Structured Logging | Console + file logging with timestamps |
- Replace
ai/smollm2:latest
with your preferred model - Extend the agent with memory or external tool integration
- Add more
.env
configs (e.g., multiple model endpoints) - Use different agent “roles” for creative applications
** **.
Tarak Chandra Sarkar — GitHub
🧠 “Run your AI locally, stay private, and keep it funny!”