Files
shahondin1624 f7af660727 migrate ai-server extension from llama.cpp router to llama-swap
Endpoint rewrites:
  - GET /v1/models + /running → merged listModels() with running flag
  - POST /models/load → GET /upstream/<id>/health (warm load)
  - POST /models/unload → POST /api/models/unload/<id> (no body)
  - Added POST /api/models/unload for unloadAll()

Config migration:
  - Preset path: ~/.llama-models.ini → ~/.config/llama-swap/config.yaml
  - Service unit: llama-server.service → llama-swap.service
  - setPresetKey() rewritten from INI awk to YAML-aware awk for
    editing --ctx-size/--temp/--n-gpu-layers in cmd: blocks

Per-model ctx-size (fixes 0/33k bug):
  - parseCtxMapFromYaml(): walks config.yaml, extracts --ctx-size N per
    model block → Map<id, ctxSize>
  - extractCtxFromRunningCmd(): parses --ctx-size from /running cmd string
  - discoverModels(): Promise.all(listModels, listRunning, readPreset),
    ctx priority: running cmd → yaml → 32768 fallback
  - Removed broken extractCtxSize stub and dangling imports

Tests: 14 passing (parseCtxMapFromYaml ×5, extractCtxFromRunningCmd ×3,
isShardArtefact ×3, isReasoningModel ×3)

README: full rewrite covering llama-swap architecture, YAML config format,
new endpoints, troubleshooting table updated.
2026-05-27 10:42:19 +02:00

386 lines
16 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ai-server — PI extension for a self-hosted llama-swap server behind mTLS
A multi-file pi extension that exposes a remote llama-swap instance as a
provider to pi, with dynamic model discovery and admin slash commands. Chat
streams use client-certificate TLS so the endpoint can be exposed over the
public internet without a bearer token.
---
## 1. Architecture
```
┌────────────┐ mTLS (HTTPS) ┌──────────────┐ HTTP ┌─────────────────┐
│ pi client │───────────────►│ Caddy │────────►│ llama-swap │
│ (this ext) │ │ 192.168.2.2 │ │ 192.168.2.3:8080 │
└────────────┘ client cert │ ai.… │ │ swap mode │
└──────────────┘ │ globalTTL: 1800 │
│ scheduler: one │
└─────────────────┘
~/.config/llama-swap/config.yaml
(YAML model config)
```
- **Caddy** terminates TLS and enforces `require_and_verify` client-cert auth
on `ai.shahondin1624.de`. Plaintext HTTP is forwarded to llama-swap.
- **llama-swap** runs in swap mode, managing model lifecycle (load/unload/swap)
with a YAML config at `~/.config/llama-swap/config.yaml`.
- **This extension** performs OpenAI-compatible chat streaming over mTLS and
surfaces admin endpoints as pi slash commands.
## 2. Extension layout
```
~/.pi/agent/extensions/ai-server/
├── index.ts entry: async discovery + registerProvider + commands
├── config.ts URLs, SSH host, cert paths, MODELS[] fallback
├── messages.ts Context → OpenAI chat/completions messages
├── stream.ts custom streamSimple: SSE parse, mTLS HTTPS, pi-ai events
├── admin.ts router HTTP client + SSH helpers (YAML edit, systemctl)
└── README.md this file
```
## 3. Environment variables
All are optional — the defaults match the current host.
| Env var | Default | Purpose |
|---|---|---|
| `AI_SERVER_URL` | `https://ai.shahondin1624.de` | Base URL of the Caddy endpoint |
| `AI_SERVER_CERTS_DIR` | `~/.pi/agent/certs` | Dir holding client cert + key + CA |
| `AI_SERVER_CA` | `<certs>/root-ca.pem` | CA file |
| `AI_SERVER_CLIENT_CERT` | `<certs>/client.pem` | Client cert |
| `AI_SERVER_CLIENT_KEY` | `<certs>/client-key.pem` | Client private key |
| `AI_SERVER_TIMEOUT_MS` | `300000` | Per-request stream timeout |
| `AI_SERVER_SSH_HOST` | `ai-server@192.168.2.3` | SSH target for admin commands |
| `AI_SERVER_PRESET_PATH` | `~/.config/llama-swap/config.yaml` | YAML config on the SSH target |
| `AI_SERVER_SERVICE_UNIT` | `llama-swap.service` | systemd unit name |
| `AI_SERVER_MODELS_PATH` | `/v1/models` | Models list endpoint |
| `AI_SERVER_RUNNING_PATH` | `/running` | Currently running models endpoint |
| `AI_SERVER_UNLOAD_PATH` | `/api/models/unload/<id>` | Unload single model |
| `AI_SERVER_UNLOAD_ALL_PATH` | `/api/models/unload` | Unload all models |
| `AI_SERVER_UPSTREAM_HEALTH_PATH` | `/upstream/<id>/health` | Warm-load / health endpoint |
## 4. Server-side setup (192.168.2.3)
### 4.1 llama-swap install
```bash
npm install -g llama-swap
# or use the binary release from the llama-swap GitHub repo
```
### 4.2 Model storage
```
~/models/<model-name>.gguf
```
### 4.3 Config file — `~/.config/llama-swap/config.yaml`
llama-swap uses a YAML config file. Each model is defined under `models:` with
a `cmd:` block containing the llama-server invocation.
```yaml
globalTTL: 1800
models:
Qwen_Qwen3.6-35B-A3B-Q8_0:
cmd: |
/home/ai-server/llama.cpp/build/bin/llama-server
--model /home/ai-server/models/Qwen_Qwen3.6-35B-A3B-Q8_0.gguf
--ctx-size 262144
--temp 0.7
--cache-type-k q8_0
--cache-type-v q8_0
--n-gpu-layers 99
MiniMax-M2.7-IQ3_XXS:
cmd: |
/home/ai-server/llama.cpp/build/bin/llama-server
--model /home/ai-server/models/MiniMax-M2.7-UD-IQ3_XXS.gguf
--ctx-size 131072
--temp 1.0
--cache-type-k q8_0
--cache-type-v q8_0
--n-gpu-layers 99
```
### 4.4 Systemd user service — `~/.config/systemd/user/llama-swap.service`
```ini
[Unit]
Description=LLaMA-swap AI Server (Swap Mode)
After=network.target
Wants=network.target
[Service]
Type=simple
User=ai-server
Group=ai-server
WorkingDirectory=/home/ai-server
ExecStart=/home/ai-server/node_modules/.bin/llama-swap \
--host 0.0.0.0 \
--port 8080 \
--config /home/ai-server/.config/llama-swap/config.yaml
LimitNOFILE=65536
LimitMEMLOCK=unlimited
LimitMEMLOCK_BYTES=107374182400
Restart=on-failure
RestartSec=5
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=default.target
```
Enable and start:
```bash
systemctl --user daemon-reload && systemctl --user enable --now llama-swap.service
loginctl enable-linger $(whoami) # keep user services running across logouts
```
### 4.5 Router HTTP API (reference)
| Method | Path | Body | Notes |
|---|---|---|---|
| `GET` | `/v1/models` | — | List models; `{"data":[{id,object,created,owned_by}]}` |
| `GET` | `/running` | — | Currently loaded models; `{"running":[{id,...}]}` |
| `POST` | `/api/models/unload` | — | Unload all models; returns `{"msg":"ok"}` |
| `POST` | `/api/models/unload/<id>` | — | Unload specific model; plain text `OK` |
| `GET` | `/upstream/<id>/health` | — | Warm-load model (forces spawn without inference) |
| `GET` | `/health` | — | Plain text `OK` (not JSON) |
| `POST` | `/v1/chat/completions` | OpenAI Chat Completions payload | What pi and the web UI use |
> **Note:** Response bodies are mixed JSON and plain text. The extension's
> `routerRequest()` falls back to `{raw: buf}` for non-JSON responses, so
> unload calls won't crash — they'll return `{raw: "OK"}`.
## 5. Caddy + mTLS setup (192.168.2.2)
Caddy config lives at `/mnt/ssdpool/@docker/caddy/` (Caddyfile, docker-compose,
certs). The domain `ai.shahondin1624.de` is configured with strict mTLS:
```caddy
ai.shahondin1624.de {
tls /etc/caddy/certs/caddy.pem /etc/caddy/certs/caddy-key.pem {
client_auth {
mode require_and_verify
trusted_ca_cert_file /etc/caddy/certs/root-ca.pem
}
}
reverse_proxy 192.168.2.3:8080
}
```
The volume mount in docker-compose must expose `./certs` into the container at
`/etc/caddy/certs:ro` — Caddy cannot read cert files that aren't inside its
filesystem namespace.
### 5.1 Certificate generation
Run on the Caddy host (192.168.2.2):
```bash
cd /mnt/ssdpool/@docker/caddy/certs
openssl genrsa -out root-ca.key 4096
openssl req -new -x509 -days 3650 -key root-ca.key -out root-ca.pem -subj "/CN=ShahODin Root CA/O=ShahODin/C=DE"
openssl genrsa -out client.key 4096
openssl req -new -key client.key -out client.csr -subj "/CN=ShahODin Client/O=ShahODin/C=DE"
openssl x509 -req -in client.csr -CA root-ca.pem -CAkey root-ca.key -CAcreateserial -out client.crt -days 3650
```
Bundle into a PKCS#12 for browser import. **Use `-legacy`** so NSS-based stores
(Firefox, Chromium on Linux, Brave Flatpak) can read it — OpenSSL 3 defaults to
PBES2/AES-256 which older parsers reject:
```bash
openssl pkcs12 -legacy -export -out client-legacy.p12 -inkey client.key -in client.crt -certfile root-ca.pem -passout pass:
```
Files needed on each client: `client.crt` (as `client.pem`), `client.key` (as
`client-key.pem`), `root-ca.pem`. For CLI usage copy them to `~/.pi/agent/certs/`
on the client machine; the extension reads them from there.
## 6. Client-side — installing the extension
```bash
# 1) Copy certs to the canonical client location
mkdir -p ~/.pi/agent/certs
scp user@caddy-host:/mnt/ssdpool/@docker/caddy/certs/client.crt ~/.pi/agent/certs/client.pem
scp user@caddy-host:/mnt/ssdpool/@docker/caddy/certs/client.key ~/.pi/agent/certs/client-key.pem
scp user@caddy-host:/mnt/ssdpool/@docker/caddy/certs/root-ca.pem ~/.pi/agent/certs/
# 2) Copy the extension directory
scp -r user@source:~/.pi/agent/extensions/ai-server ~/.pi/agent/extensions/
# 3) Optionally configure SSH key auth to the AI server (for admin commands)
ssh-copy-id ai-server@192.168.2.3
```
Run `/reload` in pi — the extension loads, discovers models from the router,
registers the `ai-server` provider, and installs the admin slash commands.
## 7. Slash commands
| Command | Purpose | Transport |
|---|---|---|
| `/ai-server-status` | Tabular view of models, status, ctx size | HTTPS mTLS |
| `/ai-server-refresh` | Re-discover models and re-register the provider | HTTPS mTLS |
| `/ai-server-load <id>` | Warm-load a model via `/upstream/<id>/health` | HTTPS mTLS |
| `/ai-server-unload <id>` | Unload a model via `/api/models/unload/<id>` | HTTPS mTLS |
| `/ai-server-ctx <id> <size>` | Edit YAML config ctx-size, reload the model | SSH + HTTPS |
| `/ai-server-preset` | Print the server's llama-swap config (YAML) | SSH |
| `/ai-server-restart` | `systemctl --user restart llama-swap.service` | SSH |
`<id>` arguments tab-complete against the live router model list.
## 8. Adding a new model
```bash
# On the AI server
ssh ai-server@192.168.2.3
cd ~/models && hf download <author>/<repo> --include '*<quant>*' --local-dir .
# Add a config block to ~/.config/llama-swap/config.yaml (see example in §4.3)
```
Then from pi:
```
/ai-server-refresh # discovers the new model
/ai-server-load <id> # first load may take a minute for a cold GGUF
```
No extension-side config changes are needed — discovery picks it up.
## 9. Browser access to the built-in web UI
Navigate to `https://ai.shahondin1624.de/` in any browser that has the client
cert and trusts the root CA.
### 9.1 Firefox (simplest path, always works)
Firefox uses its own NSS trust exclusively. Import `client-legacy.p12` under
*Preferences → Privacy & Security → Certificates → Your Certificates*, and
`root-ca.pem` under *Authorities* with "trust to identify websites" checked.
### 9.2 Chromium / Brave
Chromium on Linux now uses the bundled **Chrome Root Store** for server cert
validation. Neither `/etc/pki/ca-trust/source/anchors/` (system trust) nor the
user's `~/.pki/nssdb` alone are consulted for server cert chain verification in
recent Brave/Chrome builds. Two workarounds:
1. **`brave://certificate-manager/` → Custom** (Chromium ≥137) — import
`root-ca.pem` here and flag it as trusted for websites. This is the modern
replacement for the removed `ChromeRootStoreEnabled` policy.
2. **Fallback: Firefox** — if the Custom tab isn't available or the feature is
still buggy in a given build, use Firefox for the web UI. The mTLS client
cert import path is straightforward there.
Client-cert auth (mTLS handshake itself) still works via NSS even when server
cert validation goes through CRS, so installing the client `.p12` into NSS is
enough for handshake. Only the padlock/trust UI is affected by the CRS issue.
### 9.3 Brave Flatpak specifics
The Brave Flatpak has its own isolated NSS database at
`~/.var/app/com.brave.Browser/.pki/nssdb/`. Import directly into it:
```bash
pk12util -d sql:$HOME/.var/app/com.brave.Browser/.pki/nssdb -i ~/client-legacy.p12 -W ''
certutil -d sql:$HOME/.var/app/com.brave.Browser/.pki/nssdb -A -t "CT,C,C" -n "ShahODin Root CA" -i ~/root-ca.pem
```
To stop the "select a certificate" prompt on each page load, write a Brave
enterprise policy:
```bash
sudo flatpak override com.brave.Browser --filesystem=/etc/brave:ro
sudo install -D -m 644 /path/to/policy.json /etc/brave/policies/managed/shahondin1624.json
flatpak kill com.brave.Browser
```
Where `policy.json` contains:
```json
{
"AutoSelectCertificateForUrls": [
"{\"pattern\":\"https://ai.shahondin1624.de\",\"filter\":{\"ISSUER\":{\"CN\":\"ShahODin Root CA\"}}}"
]
}
```
Verify under `brave://policy`. The policy must show status **OK**, not
**Error** (an Error usually means the key has been renamed or removed upstream).
## 10. Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
| pi: `HTTP 400: request exceeds available context size` | Model config has a small `--ctx-size` | Increase `--ctx-size` in the YAML config |
| pi: `HTTP 400: File Not Found` on load | Wrong model id — check `/v1/models` | Use the exact id from the models list |
| Model shows as `[unloaded]` in `/ai-server-status` | Model isn't currently loaded in llama-swap | Run `/ai-server-load <id>` to warm it |
| First request is slow | Cold model load — no preload configured | Add `hooks.on_startup.preload: [<id>]` to config |
| `certutil: unable to open …root-ca.pem` | CA file not yet scp'd locally | Copy `root-ca.pem` from the Caddy host |
| Brave: p12 import "Invalid or corrupt file" | OpenSSL 3 default PBES2/AES-256 encryption | Regenerate with `openssl pkcs12 -legacy -export …` |
| Brave: site loads but padlock is red | Chrome Root Store issue | Use `brave://certificate-manager/` → Custom |
| Cert selection prompt appears on every page load | `AutoSelectCertificateForUrls` policy missing or malformed | See §9.3 |
| System-trust update-ca-trust has no effect on Brave | Brave is a Flatpak; sandbox doesn't see host `/etc/pki/ca-trust` | Import directly into the sandbox's NSS DB (§9.3) |
| Chat first-token latency seems long | Cold model load | First chat turn may wait 1060s while the GGUF mmap's in |
| `/ai-server-restart` fails | Wrong service unit name | Check `AI_SERVER_SERVICE_UNIT` / create the proper unit |
| `/ai-server-ctx` fails | YAML format changed | Edit `~/.config/llama-swap/config.yaml` manually first |
## 11. Security notes
- The client private key (`client.key` / `client-key.pem` / `client-legacy.p12`)
is the sole credential for API access. Treat it like an SSH key — do not
share, do not commit, do not email.
- To revoke a client, regenerate the root CA's cert list and remove/rename the
offending client cert file on Caddy. (Proper CRL/OCSP is not set up — this
is a single-user deployment.)
- The `apiKey: "ai-server-mtls"` string in `index.ts` is a placeholder required
by the pi model registry; no bearer token is sent over the wire. All auth is
cert-based.
- Every admin slash command with a mutating side-effect (`ctx`, `restart`) is
gated behind a `ctx.ui.confirm` dialog.
## 12. Paths reference
### On the AI server (192.168.2.3)
| Path | Purpose |
|---|---|
| `~/llama.cpp/` | llama.cpp source + build tree |
| `~/llama.cpp/build/bin/llama-server` | Binary (invoked by llama-swap) |
| `~/models/*.gguf` | Model weights |
| `~/.config/llama-swap/config.yaml` | llama-swap YAML config |
| `~/.config/systemd/user/llama-swap.service` | Service unit |
| `~/vram-monitor.sh` | Optional idle-unload cron helper |
### On the Caddy host (192.168.2.2)
| Path | Purpose |
|---|---|
| `/mnt/ssdpool/@docker/caddy/Caddyfile` | Caddy config |
| `/mnt/ssdpool/@docker/caddy/docker-compose.yml` | Caddy container definition |
| `/mnt/ssdpool/@docker/caddy/certs/root-ca.pem` | Root CA (public) |
| `/mnt/ssdpool/@docker/caddy/certs/root-ca.key` | Root CA private key (keep offline-ish) |
| `/mnt/ssdpool/@docker/caddy/certs/caddy.pem` + `caddy-key.pem` | Server cert for `ai.shahondin1624.de` |
| `/mnt/ssdpool/@docker/caddy/certs/client.crt` + `client.key` | Client cert/key |
| `/mnt/ssdpool/@docker/caddy/certs/client-legacy.p12` | Browser-import bundle (legacy-encoded) |
### On each pi client
| Path | Purpose |
|---|---|
| `~/.pi/agent/certs/client.pem` | Client cert |
| `~/.pi/agent/certs/client-key.pem` | Client private key |
| `~/.pi/agent/certs/root-ca.pem` | Root CA |
| `~/.pi/agent/extensions/ai-server/` | This extension |