1 Commits

Author SHA1 Message Date
shahondin1624 2a8a3e8e22 feat(scripts): add ai-complete CLI for direct llama-server router access
Minimal shell wrapper around llama.cpp router's OpenAI-compatible API
(/v1/chat/completions), gated by the same mTLS cert as the pi extension.
Single-file, runtime deps: bash + curl + jq. Useful for scripts and agents
(Claude Code, etc.) that want to delegate generation without pulling in
a full SDK.

Features:
  --list / --status / --load <model>
  --stream <model> "..." for SSE token-stream output
  --raw <model> '...'    for full openai-format json bodies (also @file)
  --prompt-file <path>   reads prompt from disk via jq --rawfile, bypassing
                         Linux's MAX_ARG_STRLEN (~128KB per argv) so prompts
                         up to the model's context window work
  --temperature / --top-p / --max-tokens / --system  sampling overrides
  Auto-retry with exponential backoff on transient empty/non-JSON
  responses (model-loading window). Short-circuits on structured 4xx
  errors (e.g. exceed_context_size).

AI_CERT_DIR / AI_ENDPOINT / AI_RETRIES env overrides.

Includes scripts/AI-COMPLETE.md with install + usage docs and a row in
the top-level README's scripts table.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 14:43:34 +02:00