feat(scripts): add ai-complete CLI for direct llama-server router access

Minimal shell wrapper around llama.cpp router's OpenAI-compatible API (/v1/chat/completions), gated by the same mTLS cert as the pi extension. Single-file, runtime deps: bash + curl + jq. Useful for scripts and agents (Claude Code, etc.) that want to delegate generation without pulling in a full SDK. Features: --list / --status / --load <model> --stream <model> "..." for SSE token-stream output --raw <model> '...' for full openai-format json bodies (also @file) --prompt-file <path> reads prompt from disk via jq --rawfile, bypassing Linux's MAX_ARG_STRLEN (~128KB per argv) so prompts up to the model's context window work --temperature / --top-p / --max-tokens / --system sampling overrides Auto-retry with exponential backoff on transient empty/non-JSON responses (model-loading window). Short-circuits on structured 4xx errors (e.g. exceed_context_size). AI_CERT_DIR / AI_ENDPOINT / AI_RETRIES env overrides. Includes scripts/AI-COMPLETE.md with install + usage docs and a row in the top-level README's scripts table. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 14:43:34 +02:00
parent cdae16562a
commit 2a8a3e8e22
3 changed files with 222 additions and 0 deletions
@@ -38,6 +38,7 @@ Activate with `/settings` inside pi (or set `"theme": "dark-mechanicus"` in
 | [`scripts/install-client.sh`](scripts/install-client.sh) | New pi client | Bootstraps a client: fetches certs from the Caddy host, rsyncs extensions + themes into `~/.pi/agent/`, verifies the mTLS /health endpoint. |
 | [`scripts/issue-client-cert.sh`](scripts/issue-client-cert.sh) | Caddy host | Generates a new client identity (key + cert + modern + legacy p12) for a named device. Use for each new machine. |
 | [`scripts/install-browser-certs.sh`](scripts/install-browser-certs.sh) | New client | Imports client cert + root CA into Brave Flatpak NSS, system NSS, Firefox profiles, and optionally the system trust store. Mostly obsolete since the Caddy switch to Let's Encrypt but still useful for the mTLS client cert side. |
+| [`scripts/ai-complete`](scripts/ai-complete) ([docs](scripts/AI-COMPLETE.md)) | Any pi client | Minimal shell CLI for direct llama-server router access. Wraps `https://ai.shahondin1624.de/v1/chat/completions` with mTLS, with retries, prompt-from-file, streaming, and model load/status helpers. Useful for scripts and agents (Claude Code, etc.) that want to delegate generation. |

 ## Tests

@@ -0,0 +1,54 @@
+# ai-complete — minimal CLI for the home AI server
+
+Tiny shell wrapper around llama-server's OpenAI-compatible API at `https://ai.shahondin1624.de`, gated by mTLS. One file, no runtime dependencies beyond `bash`, `curl`, `jq`.
+
+## Prerequisites
+
+- mTLS certs already installed at `~/.pi/agent/certs/{client.pem,client-key.pem,root-ca.pem}` (run [`install-client.sh`](install-client.sh) from this repo, or `install-certs.sh` from the Tobias_Devices distribution kit, if not).
+- `jq` and `curl` on the system. Fedora: `sudo dnf install jq curl`. Debian/Ubuntu: `sudo apt install jq curl`.
+
+## Install
+
+Copy the script into any directory on `$PATH`:
+
+```bash
+mkdir -p ~/bin && cp ai-complete ~/bin/ && chmod +x ~/bin/ai-complete
+case ":$PATH:" in *":$HOME/bin:"*) ;; *) echo 'export PATH="$HOME/bin:$PATH"' >> ~/.bashrc; export PATH="$HOME/bin:$PATH" ;; esac
+ai-complete --list
+```
+
+That last line lists all 14 registered models — confirms cert + endpoint work.
+
+## Use
+
+```bash
+ai-complete --list                                    # show available models
+ai-complete --load <model>                            # warm up (cold load: 30s for 30B, 1-3min for 70B+)
+ai-complete <model> "<prompt>"                        # one-shot completion (prints just assistant text)
+ai-complete --stream <model> "<prompt>"               # token-stream to stdout
+echo "<prompt>" | ai-complete <model>                 # stdin prompt
+ai-complete --system "<sys>" <model> "<prompt>"       # set a system prompt
+ai-complete --temperature 0.2 --max-tokens 500 <model> "<prompt>"
+ai-complete --raw <model> '<full openai-format json>' # full control: messages array, tools, etc
+ai-complete --status                                  # show currently loaded model
+```
+
+Models are switched on demand. The router has `--models-max 1`, so loading model B auto-unloads model A. For latency-sensitive use, call `--load` once and reuse.
+
+## Environment overrides
+
+- `AI_CERT_DIR` — alternative cert location (default `~/.pi/agent/certs`)
+- `AI_ENDPOINT` — alternative URL (default `https://ai.shahondin1624.de`)
+
+## Use from other tools
+
+- **Claude Code or any agent with shell access:** delegate generation by asking it to run `ai-complete <model> "..."` via Bash and use the captured stdout.
+- **Python scripts:** prefer the OpenAI SDK with mTLS — `OpenAI(base_url=..., api_key="not-needed", http_client=httpx.Client(cert=("client.pem","client-key.pem")))`.
+- **Anything OpenAI-compatible:** point base URL at `https://ai.shahondin1624.de/v1`, supply cert+key for mTLS, leave bearer token empty.
+
+## Troubleshooting
+
+- `missing $CERT_DIR/client.pem` → run [`install-client.sh`](install-client.sh) first (or the bundle's `install-certs.sh`).
+- `curl: (35) error:0A000412:SSL routines::sslv3 alert bad certificate` → cert isn't trusted by Caddy (wrong root CA, or the cert was revoked). Re-run the installer.
+- Empty response or `[unloaded]` → model not loaded yet, run `ai-complete --load <model>` and retry.
+- Long prompt timing out → use `--stream` to keep the connection alive while the model generates.
@@ -0,0 +1,167 @@
+#!/usr/bin/env bash
+# ai-complete — minimal CLI for the home AI server.
+#
+# Usage:
+#   ai-complete <model> "<prompt>"                       # one-shot completion
+#   echo "prompt" | ai-complete <model>                  # stdin prompt
+#   ai-complete --list                                   # list available models
+#   ai-complete --load <model>                           # warm up a model (cold load can take 1-3 min)
+#   ai-complete --status                                 # show currently loaded model
+#   ai-complete --raw <model> '<full-json-body>'         # send full /v1/chat/completions body
+#   ai-complete --stream <model> "<prompt>"              # streaming SSE output
+#   ai-complete --temperature N --top-p N <model> "..."  # sampling overrides
+#
+# Auth: uses the client cert at ~/.pi/agent/certs/{client.pem,client-key.pem}
+# Override with $AI_CERT_DIR.
+#
+# Output: prints just the assistant text to stdout (no JSON wrapping).
+# Errors go to stderr and exit non-zero.
+
+set -euo pipefail
+
+CERT_DIR="${AI_CERT_DIR:-$HOME/.pi/agent/certs}"
+ENDPOINT="${AI_ENDPOINT:-https://ai.shahondin1624.de}"
+CRT="$CERT_DIR/client.pem"
+KEY="$CERT_DIR/client-key.pem"
+
+[[ -f "$CRT" && -f "$KEY" ]] || { echo "missing $CRT or $KEY (set AI_CERT_DIR)" >&2; exit 1; }
+command -v jq >/dev/null || { echo "needs jq" >&2; exit 1; }
+
+curl_args=(-sS --cert "$CRT" --key "$KEY")
+
+case "${1:-}" in
+    --list)
+        curl "${curl_args[@]}" "$ENDPOINT/v1/models" | jq -r '.data[].id'
+        ;;
+    --status)
+        # Router exposes load state at /models (not /v1/models) under .data[].status.value
+        out=$(curl "${curl_args[@]}" "$ENDPOINT/models" | jq -r '.data[] | select(.status.value=="loaded") | .id')
+        if [[ -z "$out" ]]; then echo "none"; else echo "$out"; fi
+        ;;
+    --load)
+        # No dedicated load endpoint; the router auto-loads on first request.
+        # Send a 1-token completion to trigger the load and wait until it's done.
+        model="$2"
+        body=$(jq -n --arg m "$model" '{model:$m, messages:[{role:"user",content:"."}], max_tokens:1, stream:false}')
+        curl "${curl_args[@]}" -X POST --max-time 600 "$ENDPOINT/v1/chat/completions" -H 'Content-Type: application/json' -d "$body" >/dev/null
+        echo "loaded: $model"
+        ;;
+    --raw)
+        # body source: $3 (argv-capped) or @path syntax to read body from file
+        model="$2"; body_arg="$3"
+        body_path=$(mktemp); trap 'rm -f "$body_path"' EXIT
+        if [[ "${body_arg:0:1}" == "@" ]]; then
+            cp "${body_arg:1}" "$body_path"
+        else
+            printf '%s' "$body_arg" > "$body_path"
+        fi
+        curl "${curl_args[@]}" -X POST "$ENDPOINT/v1/chat/completions" -H 'Content-Type: application/json' --data-binary @"$body_path"
+        ;;
+    --stream)
+        shift
+        # Reuse the same prompt-source logic as the default path
+        prompt_path=""
+        cleanup_paths=()
+        trap 'rm -f "${cleanup_paths[@]}"' EXIT
+        if [[ "${1:-}" == "--prompt-file" ]]; then
+            [[ -f "$2" ]] || { echo "prompt-file not found: $2" >&2; exit 1; }
+            prompt_path="$2"; shift 2
+        fi
+        model="$1"; shift
+        if [[ -z "$prompt_path" ]]; then
+            prompt_path=$(mktemp); cleanup_paths+=("$prompt_path")
+            if [[ -n "${1:-}" ]]; then printf '%s' "$1" > "$prompt_path"
+            else cat > "$prompt_path"; fi
+        fi
+        body_path=$(mktemp); cleanup_paths+=("$body_path")
+        jq -n --arg m "$model" --rawfile p "$prompt_path" \
+            '{model:$m,messages:[{role:"user",content:$p}],stream:true}' > "$body_path"
+        curl "${curl_args[@]}" -N -X POST "$ENDPOINT/v1/chat/completions" \
+            -H 'Content-Type: application/json' --data-binary @"$body_path" \
+            | sed -u 's/^data: //' \
+            | grep -v '^$' \
+            | grep -v '^\[DONE\]' \
+            | jq -j --unbuffered '.choices[0].delta.content // empty'
+        echo
+        ;;
+    --help|-h)
+        sed -n '2,/^$/p' "$0" | sed 's/^# \{0,1\}//'
+        ;;
+    *)
+        # Default: one-shot non-streaming completion
+        # Optional flags: --temperature N --top-p N --max-tokens N
+        opts='{}'
+        while [[ "${1:-}" == --* ]]; do
+            case "$1" in
+                --temperature)  opts=$(jq --argjson v "$2" '.temperature=$v'  <<<"$opts"); shift 2 ;;
+                --top-p)        opts=$(jq --argjson v "$2" '."top_p"=$v'      <<<"$opts"); shift 2 ;;
+                --max-tokens)   opts=$(jq --argjson v "$2" '."max_tokens"=$v' <<<"$opts"); shift 2 ;;
+                --system)       opts=$(jq --arg v "$2"   '.system=$v'         <<<"$opts"); shift 2 ;;
+                --prompt-file)  opts=$(jq --arg v "$2"   '."_prompt_file"=$v' <<<"$opts"); shift 2 ;;
+                *) echo "unknown flag $1" >&2; exit 1 ;;
+            esac
+        done
+        model="${1:-}"
+        [[ -n "$model" ]] || { echo "usage: ai-complete <model> \"prompt\"" >&2; exit 1; }
+        # Materialize the prompt to a temp file so it never travels through any argv
+        # (argv is capped at MAX_ARG_STRLEN = 128KB per element on Linux).
+        # Source priority: --prompt-file (zero-copy) > $2 (capped at 128KB) > stdin (uncapped).
+        prompt_file=$(jq -r '."_prompt_file" // empty' <<<"$opts")
+        opts=$(jq 'del(."_prompt_file")' <<<"$opts")
+        prompt_path=""
+        cleanup_paths=()
+        trap 'rm -f "${cleanup_paths[@]}"' EXIT
+        if [[ -n "$prompt_file" ]]; then
+            [[ -f "$prompt_file" ]] || { echo "prompt-file not found: $prompt_file" >&2; exit 1; }
+            prompt_path="$prompt_file"
+        elif [[ -n "${2:-}" ]]; then
+            prompt_path=$(mktemp); cleanup_paths+=("$prompt_path")
+            printf '%s' "$2" > "$prompt_path"
+        else
+            prompt_path=$(mktemp); cleanup_paths+=("$prompt_path")
+            cat > "$prompt_path"
+        fi
+
+        # Build the full request body in a single jq call — uses --rawfile for the prompt
+        # so it's read from disk, not argv. Write the body to a temp file too so curl -d @file
+        # avoids the same argv cap.
+        sys=$(jq -r '.system // empty' <<<"$opts")
+        body_path=$(mktemp); cleanup_paths+=("$body_path")
+        jq -n --arg m "$model" --arg s "$sys" --rawfile p "$prompt_path" --argjson o "$opts" \
+            'def msgs: if $s == "" then [{role:"user",content:$p}] else [{role:"system",content:$s},{role:"user",content:$p}] end;
+             {model:$m, messages:msgs, stream:false} + ($o | del(.system))' > "$body_path"
+
+        # Retry loop: model auto-load can take 1-3 min for big models. On the first call
+        # after eviction, the response can be empty / non-JSON / a 503 — wait and retry.
+        attempts="${AI_RETRIES:-4}"
+        delay=2
+        text=""
+        last_resp=""
+        resp_path=$(mktemp); cleanup_paths+=("$resp_path")
+        for ((i=1; i<=attempts; i++)); do
+            curl "${curl_args[@]}" -X POST --max-time 600 \
+                "$ENDPOINT/v1/chat/completions" -H 'Content-Type: application/json' \
+                --data-binary @"$body_path" -o "$resp_path" 2>/dev/null || true
+            last_resp=$(cat "$resp_path" 2>/dev/null || echo "")
+            text=$(jq -r '.choices[0].message.content // empty' "$resp_path" 2>/dev/null || echo "")
+            if [[ -n "$text" ]]; then break; fi
+            # If the response is structured JSON with an "error" key (eg 400 exceed_context_size),
+            # don't retry — it's a real client-side error, not a transient load timeout.
+            err=$(jq -r '.error.message // empty' "$resp_path" 2>/dev/null || echo "")
+            if [[ -n "$err" ]]; then
+                echo "[ai-complete] API error (no retry): $err" >&2
+                break
+            fi
+            echo "[ai-complete] attempt $i/$attempts produced no text (likely model loading), retrying in ${delay}s…" >&2
+            sleep "$delay"
+            delay=$(( delay * 2 ))
+        done
+        if [[ -n "$text" ]]; then
+            printf '%s\n' "$text"
+        else
+            echo "[ai-complete] gave up after $attempts attempts. Last response:" >&2
+            echo "$last_resp" >&2
+            exit 1
+        fi
+        ;;
+esac