Minimal agent runtime built with DSPy modules and a thin Python loop.
- Plan/Act/Finalize expressed as DSPy
Signatures, with OpenAI-native tool-calling when available. - Thin runtime (
agent.py) handles looping, tool routing, and trace persistence. - CLI and FastAPI server, plus a tiny eval harness.
- Python 3.10+
- Create a virtualenv and install (using
uv, or see pip alternative below):
uv venv && source .venv/bin/activate
uv pip install -e .
cp .env.example .env # set OPENAI_API_KEY or configure Ollama
# Ask a question (append --utc to nudge UTC use when time is relevant)
micro-agent ask --question "What's 2*(3+5)?" --utc
# Run the API server
uvicorn micro_agent.server:app --reload --port 8000
# Run quick evals (repeat small dataset)
python evals/run_evals.py --n 50Pip alternative:
python -m venv .venv && source .venv/bin/activate
pip install -e ..envis loaded automatically (viapython-dotenv).- Set one of the following provider configs:
- OpenAI (default):
OPENAI_API_KEY,OPENAI_MODEL(defaultgpt-4o-mini) - Ollama:
LLM_PROVIDER=ollama,OLLAMA_MODEL(e.g.llama3.2:1b),OLLAMA_HOST(defaulthttp://localhost:11434)
- OpenAI (default):
- Optional tuning:
TEMPERATURE(default0.2),MAX_TOKENS(default1024) - Tool plugins:
TOOLS_MODULES="your_pkg.tools,other_pkg.tools"to load extra tools (see Tools below) - Traces location:
TRACES_DIR(defaulttraces/) - Compiled demos (OpenAI planner):
COMPILED_DEMOS_PATH(defaultopt/plan_demos.json)
Examples:
# OpenAI
export OPENAI_API_KEY=...
export OPENAI_MODEL=gpt-4o-mini
# Ollama
export LLM_PROVIDER=ollama
export OLLAMA_MODEL=llama3.2:1b
export OLLAMA_HOST=http://localhost:11434micro-agent ask --question <text> [--utc] [--max-steps N]--utcappends a hint to prefer UTC when time is used.- Saves a JSONL trace under
traces/<id>.jsonland prints the path.
micro-agent replay --path traces/<id>.jsonl [--index -1]- Pretty-prints a saved record from the JSONL file.
Examples:
micro-agent ask --question "Add 12345 and 67890, then show the current date (UTC)." --utc
micro-agent ask --question "Compute (7**2 + 14)/5 and explain briefly." --max-steps 4
micro-agent replay --path traces/<id>.jsonl --index -1- Start:
uvicorn micro_agent.server:app --reload --port 8000 - Endpoint:
POST /ask- Request JSON:
{ "question": "...", "max_steps": 6, "use_tool_calls": bool? } - Response JSON:
{ "answer": str, "trace_id": str, "trace_path": str, "steps": [...], "usage": {...}, "cost_usd": number } - Health:
GET /healthz(ok),GET /health(provider/model),GET /version(package version)
- Request JSON:
Example:
curl -s http://localhost:8000/ask \
-H 'content-type: application/json' \
-d '{"question":"What\'s 2*(3+5)?","max_steps":6}' | jq .OpenAPI:
- FastAPI publishes
/openapi.jsonand interactive docs at/docs. - Schemas reflect
AskRequestandAskResponsemodels inmicro_agent/server.py.
- Ask, capture
trace_id, then fetch the full trace by id:
RESP=$(curl -s http://localhost:8000/ask \
-H 'content-type: application/json' \
-d '{"question":"Add 12345 and 67890, then UTC time.","max_steps":6}')
echo "$RESP" | jq .
TID=$(echo "$RESP" | jq -r .trace_id)
curl -s http://localhost:8000/trace/$TID | jq .- Replay the saved JSONL locally using the CLI (last record by default index -1):
micro-agent replay --path traces/$TID.jsonl --index -1- Controlled via
MICRO_AGENT_LOG(debug|info|warning|error). Default:INFO. - Applies to both CLI and server.
- Built-ins live in
micro_agent/tools.py:calculator: safe expression evaluator. Supports+ - * / ** % // ( )and!via rewrite tofact(n).now: current timestamp;{timezone: "utc"|"local"}(default local).
- Each tool is defined as:
Tool(
"name",
"description",
{"type":"object","properties":{...},"required":[...]},
handler_function,
)
- Plugins: set
TOOLS_MODULESto a comma-separated list of importable modules. Each module should expose either aTOOLS: dict[str, Tool]or aget_tools() -> dict[str, Tool].
Runtime validation
- Tool args are validated against the JSON Schema before execution; invalid args add a
⛔️validation_errorstep and the agent requests a correction in the next loop. Seemicro_agent/tools.py(run_tool) andmicro_agent/agent.py(validation error handling).
Calculator limits
- Factorial capped at 12; exponent size bounded; AST node count limited; large magnitudes rejected to prevent runaway compute. Only a small set of arithmetic nodes is allowed.
- OpenAI: uses DSPy
PlanWithToolswithJSONAdapterto enable native function-calls. The model may returntool_callsor afinalanswer; tool calls are executed via our registry. - Others (e.g., Ollama): uses a robust prompt with few-shot JSON decision demos. Decisions are parsed with strict JSON; on failure we try
json_repair(if installed) and Python-literal parsing. - Policy enforcement: if the question implies math, the agent requires a
calculatorstep before finalizing; likewise for time/date with thenowtool. Violations are recorded in the trace as⛔️policy_violationsteps and planning continues.
Code references (discoverability)
- Replay subcommand:
micro_agent/cli.py(subparserreplay, printing JSONL) - Policy enforcement markers:
micro_agent/agent.py(look for⛔️policy_violationand⛔️validation_error) - Provider fallback and configuration:
micro_agent/config.py(configure_lmtries Ollama → OpenAI → registry fallbacks) - JSON repair in decision parsing:
micro_agent/runtime.py(parse_decision_textusesjson_repairif available)
- Each run appends a record to
traces/<id>.jsonlwith fields:id,ts,question,steps,answer. - Steps are
{tool, args, observation}in order of execution. - Replay:
micro-agent replay --path traces/<id>.jsonl --index -1. - Fetch by id (HTTP):
GET /trace/{id}(CORS enabled).
- Dataset:
evals/tasks.yaml(small, mixed math/time tasks). Rubric:evals/rubrics.yaml. - Run:
python evals/run_evals.py --n 50. - Metrics printed:
success_rate,avg_latency_sec,avg_lm_calls,avg_tool_calls,avg_steps,avg_cost_usd,n. - Scoring supports both
expect_contains(answer substring) andexpect_key(key present in any tool observation). Weights come fromrubrics.yaml(contains_weight,key_weight).
- Model:
gpt-4o-mini, N=30 - Before (no demos): success_rate 1.00; avg_latency_sec ~0.188; avg_lm_calls 3.33; avg_tool_calls 1.17; avg_steps 3.17
- After (compiled demos loaded): success_rate 1.00; avg_latency_sec ~0.188; avg_lm_calls 3.33; avg_tool_calls 1.17; avg_steps 3.17 Notes: For this small dataset, demos neither help nor hurt. For larger flows, compile demos from your real tasks.
- The agent aggregates token counts and cost. If provider usage isn’t exposed, it estimates tokens from prompts/outputs and computes cost using prices.
- Set env prices for OpenAI models (USD per 1K tokens):
export OPENAI_INPUT_PRICE_PER_1K=0.005 # example
export OPENAI_OUTPUT_PRICE_PER_1K=0.015 # exampleDefaults: for OpenAI models, built‑in prices are used if env isn’t set (best‑effort):
- gpt-4o-mini: $0.00015 in / $0.0006 out per 1K tokens
- gpt-4o (and 4.1): $0.005 in / $0.015 out per 1K tokens
You can override via the env vars above. Evals print
avg_cost_usd.
- Compile optimized few-shot demos for the OpenAI
PlanWithToolsplanner and save to JSON:
micro-agent optimize --n 12 --tasks evals/tasks.yaml --save opt/plan_demos.json- Apply compiled demos automatically by placing them at the default path or setting:
export COMPILED_DEMOS_PATH=opt/plan_demos.json- Optional: print a DSPy teleprompting template (for notebooks):
micro-agent optimize --n 12 --templateThe agent loads these demos on OpenAI providers and attaches them to the PlanWithTools predictor to improve tool selection and output consistency.
micro_agent/config.py: configures DSPy LM. Tries Ollama first if requested, else OpenAI; supportsdspy.Ollama,dspy.OpenAI, and registry fallbacks likedspy.LM("openai/<model>").micro_agent/signatures.py: DSPySignatures for plan/act/finalize and OpenAI tool-calls.micro_agent/agent.py: the runtime loop (~100+ LOC). Builds a JSON decision prompt, executes tools, enforces policy, and finalizes.micro_agent/runtime.py: trace format, persistence, and robust JSON decision parsing utilities.micro_agent/cli.py: CLI entry (micro-agent).micro_agent/server.py: FastAPI app exposingPOST /ask.evals/: tiny harness to sample tasks, capture metrics, and save traces.
- Make targets:
make init,make run,make serve,make evals,make test. - Tests:
pytest -q(note: tests are minimal and do not cover all paths).
- Build:
make docker-build - Run (OpenAI):
OPENAI_API_KEY=... make docker-run(maps:8000) - Run (Ollama on host):
make docker-run-ollama(useshost.docker.internal:11434) - Env (OpenAI):
OPENAI_API_KEY,OPENAI_MODEL=gpt-4o-mini - Env (Ollama):
LLM_PROVIDER=ollama,OLLAMA_HOST=http://host.docker.internal:11434,OLLAMA_MODEL=llama3.1:8b - Service:
POST http://localhost:8000/askandGET /trace/{id}
- DSPy is pinned to
dspy-ai>=2.5.0. Some adapters (e.g.,JSONAdapter,dspy.Ollama) may vary across versions; the code tries multiple backends and falls back to generic registry forms when needed. - If
json_repairis installed, it is used opportunistically to salvage slightly malformed JSON decisions.- Optional install:
pip install -e .[repair]
- Optional install:
- Usage/cost capture is best-effort: exact numbers depend on provider support; otherwise the agent estimates from text.
- The finalization step often composes from tool results for reliability; you can swap in a DSPy
Finalizepredictor if preferred. - Add persistence to a DB instead of JSONL by replacing
dump_trace. - Add human-in-the-loop, budgets, retries, or branching per your needs.
Prove: an “agent” can be expressed as DSPy modules plus a thin runtime loop.