pi-extensions

shahondin1624/pi-extensions

Fork 0

Commit Graph

Author	SHA1	Message	Date
shahondin1624	2a8a3e8e22	feat(scripts): add ai-complete CLI for direct llama-server router access Minimal shell wrapper around llama.cpp router's OpenAI-compatible API (/v1/chat/completions), gated by the same mTLS cert as the pi extension. Single-file, runtime deps: bash + curl + jq. Useful for scripts and agents (Claude Code, etc.) that want to delegate generation without pulling in a full SDK. Features: --list / --status / --load <model> --stream <model> "..." for SSE token-stream output --raw <model> '...' for full openai-format json bodies (also @file) --prompt-file <path> reads prompt from disk via jq --rawfile, bypassing Linux's MAX_ARG_STRLEN (~128KB per argv) so prompts up to the model's context window work --temperature / --top-p / --max-tokens / --system sampling overrides Auto-retry with exponential backoff on transient empty/non-JSON responses (model-loading window). Short-circuits on structured 4xx errors (e.g. exceed_context_size). AI_CERT_DIR / AI_ENDPOINT / AI_RETRIES env overrides. Includes scripts/AI-COMPLETE.md with install + usage docs and a row in the top-level README's scripts table. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 14:43:34 +02:00

Author

SHA1

Message

Date

shahondin1624

2a8a3e8e22

feat(scripts): add ai-complete CLI for direct llama-server router access

Minimal shell wrapper around llama.cpp router's OpenAI-compatible API
(/v1/chat/completions), gated by the same mTLS cert as the pi extension.
Single-file, runtime deps: bash + curl + jq. Useful for scripts and agents
(Claude Code, etc.) that want to delegate generation without pulling in
a full SDK.

Features:
  --list / --status / --load <model>
  --stream <model> "..." for SSE token-stream output
  --raw <model> '...'    for full openai-format json bodies (also @file)
  --prompt-file <path>   reads prompt from disk via jq --rawfile, bypassing
                         Linux's MAX_ARG_STRLEN (~128KB per argv) so prompts
                         up to the model's context window work
  --temperature / --top-p / --max-tokens / --system  sampling overrides
  Auto-retry with exponential backoff on transient empty/non-JSON
  responses (model-loading window). Short-circuits on structured 4xx
  errors (e.g. exceed_context_size).

AI_CERT_DIR / AI_ENDPOINT / AI_RETRIES env overrides.

Includes scripts/AI-COMPLETE.md with install + usage docs and a row in
the top-level README's scripts table.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-28 14:43:34 +02:00

1 Commits