2a8a3e8e22
Minimal shell wrapper around llama.cpp router's OpenAI-compatible API
(/v1/chat/completions), gated by the same mTLS cert as the pi extension.
Single-file, runtime deps: bash + curl + jq. Useful for scripts and agents
(Claude Code, etc.) that want to delegate generation without pulling in
a full SDK.
Features:
--list / --status / --load <model>
--stream <model> "..." for SSE token-stream output
--raw <model> '...' for full openai-format json bodies (also @file)
--prompt-file <path> reads prompt from disk via jq --rawfile, bypassing
Linux's MAX_ARG_STRLEN (~128KB per argv) so prompts
up to the model's context window work
--temperature / --top-p / --max-tokens / --system sampling overrides
Auto-retry with exponential backoff on transient empty/non-JSON
responses (model-loading window). Short-circuits on structured 4xx
errors (e.g. exceed_context_size).
AI_CERT_DIR / AI_ENDPOINT / AI_RETRIES env overrides.
Includes scripts/AI-COMPLETE.md with install + usage docs and a row in
the top-level README's scripts table.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3.0 KiB
3.0 KiB
ai-complete — minimal CLI for the home AI server
Tiny shell wrapper around llama-server's OpenAI-compatible API at https://ai.shahondin1624.de, gated by mTLS. One file, no runtime dependencies beyond bash, curl, jq.
Prerequisites
- mTLS certs already installed at
~/.pi/agent/certs/{client.pem,client-key.pem,root-ca.pem}(runinstall-client.shfrom this repo, orinstall-certs.shfrom the Tobias_Devices distribution kit, if not). jqandcurlon the system. Fedora:sudo dnf install jq curl. Debian/Ubuntu:sudo apt install jq curl.
Install
Copy the script into any directory on $PATH:
mkdir -p ~/bin && cp ai-complete ~/bin/ && chmod +x ~/bin/ai-complete
case ":$PATH:" in *":$HOME/bin:"*) ;; *) echo 'export PATH="$HOME/bin:$PATH"' >> ~/.bashrc; export PATH="$HOME/bin:$PATH" ;; esac
ai-complete --list
That last line lists all 14 registered models — confirms cert + endpoint work.
Use
ai-complete --list # show available models
ai-complete --load <model> # warm up (cold load: 30s for 30B, 1-3min for 70B+)
ai-complete <model> "<prompt>" # one-shot completion (prints just assistant text)
ai-complete --stream <model> "<prompt>" # token-stream to stdout
echo "<prompt>" | ai-complete <model> # stdin prompt
ai-complete --system "<sys>" <model> "<prompt>" # set a system prompt
ai-complete --temperature 0.2 --max-tokens 500 <model> "<prompt>"
ai-complete --raw <model> '<full openai-format json>' # full control: messages array, tools, etc
ai-complete --status # show currently loaded model
Models are switched on demand. The router has --models-max 1, so loading model B auto-unloads model A. For latency-sensitive use, call --load once and reuse.
Environment overrides
AI_CERT_DIR— alternative cert location (default~/.pi/agent/certs)AI_ENDPOINT— alternative URL (defaulthttps://ai.shahondin1624.de)
Use from other tools
- Claude Code or any agent with shell access: delegate generation by asking it to run
ai-complete <model> "..."via Bash and use the captured stdout. - Python scripts: prefer the OpenAI SDK with mTLS —
OpenAI(base_url=..., api_key="not-needed", http_client=httpx.Client(cert=("client.pem","client-key.pem"))). - Anything OpenAI-compatible: point base URL at
https://ai.shahondin1624.de/v1, supply cert+key for mTLS, leave bearer token empty.
Troubleshooting
missing $CERT_DIR/client.pem→ runinstall-client.shfirst (or the bundle'sinstall-certs.sh).curl: (35) error:0A000412:SSL routines::sslv3 alert bad certificate→ cert isn't trusted by Caddy (wrong root CA, or the cert was revoked). Re-run the installer.- Empty response or
[unloaded]→ model not loaded yet, runai-complete --load <model>and retry. - Long prompt timing out → use
--streamto keep the connection alive while the model generates.