Files

T

shahondin1624 2a8a3e8e22 feat(scripts): add ai-complete CLI for direct llama-server router access

Minimal shell wrapper around llama.cpp router's OpenAI-compatible API
(/v1/chat/completions), gated by the same mTLS cert as the pi extension.
Single-file, runtime deps: bash + curl + jq. Useful for scripts and agents
(Claude Code, etc.) that want to delegate generation without pulling in
a full SDK.

Features:
  --list / --status / --load <model>
  --stream <model> "..." for SSE token-stream output
  --raw <model> '...'    for full openai-format json bodies (also @file)
  --prompt-file <path>   reads prompt from disk via jq --rawfile, bypassing
                         Linux's MAX_ARG_STRLEN (~128KB per argv) so prompts
                         up to the model's context window work
  --temperature / --top-p / --max-tokens / --system  sampling overrides
  Auto-retry with exponential backoff on transient empty/non-JSON
  responses (model-loading window). Short-circuits on structured 4xx
  errors (e.g. exceed_context_size).

AI_CERT_DIR / AI_ENDPOINT / AI_RETRIES env overrides.

Includes scripts/AI-COMPLETE.md with install + usage docs and a row in
the top-level README's scripts table.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-28 14:43:34 +02:00

3.0 KiB

Raw Blame History

ai-complete — minimal CLI for the home AI server

Tiny shell wrapper around llama-server's OpenAI-compatible API at https://ai.shahondin1624.de, gated by mTLS. One file, no runtime dependencies beyond bash, curl, jq.

Prerequisites

mTLS certs already installed at ~/.pi/agent/certs/{client.pem,client-key.pem,root-ca.pem} (run install-client.sh from this repo, or install-certs.sh from the Tobias_Devices distribution kit, if not).
jq and curl on the system. Fedora: sudo dnf install jq curl. Debian/Ubuntu: sudo apt install jq curl.

Install

Copy the script into any directory on $PATH:

mkdir -p ~/bin && cp ai-complete ~/bin/ && chmod +x ~/bin/ai-complete
case ":$PATH:" in *":$HOME/bin:"*) ;; *) echo 'export PATH="$HOME/bin:$PATH"' >> ~/.bashrc; export PATH="$HOME/bin:$PATH" ;; esac
ai-complete --list

That last line lists all 14 registered models — confirms cert + endpoint work.

Use

ai-complete --list                                    # show available models
ai-complete --load <model>                            # warm up (cold load: 30s for 30B, 1-3min for 70B+)
ai-complete <model> "<prompt>"                        # one-shot completion (prints just assistant text)
ai-complete --stream <model> "<prompt>"               # token-stream to stdout
echo "<prompt>" | ai-complete <model>                 # stdin prompt
ai-complete --system "<sys>" <model> "<prompt>"       # set a system prompt
ai-complete --temperature 0.2 --max-tokens 500 <model> "<prompt>"
ai-complete --raw <model> '<full openai-format json>' # full control: messages array, tools, etc
ai-complete --status                                  # show currently loaded model

Models are switched on demand. The router has --models-max 1, so loading model B auto-unloads model A. For latency-sensitive use, call --load once and reuse.

Environment overrides

AI_CERT_DIR — alternative cert location (default ~/.pi/agent/certs)
AI_ENDPOINT — alternative URL (default https://ai.shahondin1624.de)

Use from other tools

Claude Code or any agent with shell access: delegate generation by asking it to run ai-complete <model> "..." via Bash and use the captured stdout.
Python scripts: prefer the OpenAI SDK with mTLS — OpenAI(base_url=..., api_key="not-needed", http_client=httpx.Client(cert=("client.pem","client-key.pem"))).
Anything OpenAI-compatible: point base URL at https://ai.shahondin1624.de/v1, supply cert+key for mTLS, leave bearer token empty.

Troubleshooting

missing $CERT_DIR/client.pem → run install-client.sh first (or the bundle's install-certs.sh).
curl: (35) error:0A000412:SSL routines::sslv3 alert bad certificate → cert isn't trusted by Caddy (wrong root CA, or the cert was revoked). Re-run the installer.
Empty response or [unloaded] → model not loaded yet, run ai-complete --load <model> and retry.
Long prompt timing out → use --stream to keep the connection alive while the model generates.

3.0 KiB Raw Blame History