# ai-server — PI extension for a self-hosted llama-swap server behind mTLS A multi-file pi extension that exposes a remote llama-swap instance as a provider to pi, with dynamic model discovery and admin slash commands. Chat streams use client-certificate TLS so the endpoint can be exposed over the public internet without a bearer token. --- ## 1. Architecture ``` ┌────────────┐ mTLS (HTTPS) ┌──────────────┐ HTTP ┌─────────────────┐ │ pi client │───────────────►│ Caddy │────────►│ llama-swap │ │ (this ext) │ │ 192.168.2.2 │ │ 192.168.2.3:8080 │ └────────────┘ client cert │ ai.… │ │ swap mode │ └──────────────┘ │ globalTTL: 1800 │ │ scheduler: one │ └─────────────────┘ │ ~/.config/llama-swap/config.yaml (YAML model config) ``` - **Caddy** terminates TLS and enforces `require_and_verify` client-cert auth on `ai.shahondin1624.de`. Plaintext HTTP is forwarded to llama-swap. - **llama-swap** runs in swap mode, managing model lifecycle (load/unload/swap) with a YAML config at `~/.config/llama-swap/config.yaml`. - **This extension** performs OpenAI-compatible chat streaming over mTLS and surfaces admin endpoints as pi slash commands. ## 2. Extension layout ``` ~/.pi/agent/extensions/ai-server/ ├── index.ts entry: async discovery + registerProvider + commands ├── config.ts URLs, SSH host, cert paths, MODELS[] fallback ├── messages.ts Context → OpenAI chat/completions messages ├── stream.ts custom streamSimple: SSE parse, mTLS HTTPS, pi-ai events ├── admin.ts router HTTP client + SSH helpers (YAML edit, systemctl) └── README.md this file ``` ## 3. Environment variables All are optional — the defaults match the current host. | Env var | Default | Purpose | |---|---|---| | `AI_SERVER_URL` | `https://ai.shahondin1624.de` | Base URL of the Caddy endpoint | | `AI_SERVER_CERTS_DIR` | `~/.pi/agent/certs` | Dir holding client cert + key + CA | | `AI_SERVER_CA` | `/root-ca.pem` | CA file | | `AI_SERVER_CLIENT_CERT` | `/client.pem` | Client cert | | `AI_SERVER_CLIENT_KEY` | `/client-key.pem` | Client private key | | `AI_SERVER_TIMEOUT_MS` | `300000` | Per-request stream timeout | | `AI_SERVER_SSH_HOST` | `ai-server@192.168.2.3` | SSH target for admin commands | | `AI_SERVER_PRESET_PATH` | `~/.config/llama-swap/config.yaml` | YAML config on the SSH target | | `AI_SERVER_SERVICE_UNIT` | `llama-swap.service` | systemd unit name | | `AI_SERVER_MODELS_PATH` | `/v1/models` | Models list endpoint | | `AI_SERVER_RUNNING_PATH` | `/running` | Currently running models endpoint | | `AI_SERVER_UNLOAD_PATH` | `/api/models/unload/` | Unload single model | | `AI_SERVER_UNLOAD_ALL_PATH` | `/api/models/unload` | Unload all models | | `AI_SERVER_UPSTREAM_HEALTH_PATH` | `/upstream//health` | Warm-load / health endpoint | ## 4. Server-side setup (192.168.2.3) ### 4.1 llama-swap install ```bash npm install -g llama-swap # or use the binary release from the llama-swap GitHub repo ``` ### 4.2 Model storage ``` ~/models/.gguf ``` ### 4.3 Config file — `~/.config/llama-swap/config.yaml` llama-swap uses a YAML config file. Each model is defined under `models:` with a `cmd:` block containing the llama-server invocation. ```yaml globalTTL: 1800 models: Qwen_Qwen3.6-35B-A3B-Q8_0: cmd: | /home/ai-server/llama.cpp/build/bin/llama-server --model /home/ai-server/models/Qwen_Qwen3.6-35B-A3B-Q8_0.gguf --ctx-size 262144 --temp 0.7 --cache-type-k q8_0 --cache-type-v q8_0 --n-gpu-layers 99 MiniMax-M2.7-IQ3_XXS: cmd: | /home/ai-server/llama.cpp/build/bin/llama-server --model /home/ai-server/models/MiniMax-M2.7-UD-IQ3_XXS.gguf --ctx-size 131072 --temp 1.0 --cache-type-k q8_0 --cache-type-v q8_0 --n-gpu-layers 99 ``` ### 4.4 Systemd user service — `~/.config/systemd/user/llama-swap.service` ```ini [Unit] Description=LLaMA-swap AI Server (Swap Mode) After=network.target Wants=network.target [Service] Type=simple User=ai-server Group=ai-server WorkingDirectory=/home/ai-server ExecStart=/home/ai-server/node_modules/.bin/llama-swap \ --host 0.0.0.0 \ --port 8080 \ --config /home/ai-server/.config/llama-swap/config.yaml LimitNOFILE=65536 LimitMEMLOCK=unlimited LimitMEMLOCK_BYTES=107374182400 Restart=on-failure RestartSec=5 StandardOutput=journal StandardError=journal [Install] WantedBy=default.target ``` Enable and start: ```bash systemctl --user daemon-reload && systemctl --user enable --now llama-swap.service loginctl enable-linger $(whoami) # keep user services running across logouts ``` ### 4.5 Router HTTP API (reference) | Method | Path | Body | Notes | |---|---|---|---| | `GET` | `/v1/models` | — | List models; `{"data":[{id,object,created,owned_by}]}` | | `GET` | `/running` | — | Currently loaded models; `{"running":[{id,...}]}` | | `POST` | `/api/models/unload` | — | Unload all models; returns `{"msg":"ok"}` | | `POST` | `/api/models/unload/` | — | Unload specific model; plain text `OK` | | `GET` | `/upstream//health` | — | Warm-load model (forces spawn without inference) | | `GET` | `/health` | — | Plain text `OK` (not JSON) | | `POST` | `/v1/chat/completions` | OpenAI Chat Completions payload | What pi and the web UI use | > **Note:** Response bodies are mixed JSON and plain text. The extension's > `routerRequest()` falls back to `{raw: buf}` for non-JSON responses, so > unload calls won't crash — they'll return `{raw: "OK"}`. ## 5. Caddy + mTLS setup (192.168.2.2) Caddy config lives at `/mnt/ssdpool/@docker/caddy/` (Caddyfile, docker-compose, certs). The domain `ai.shahondin1624.de` is configured with strict mTLS: ```caddy ai.shahondin1624.de { tls /etc/caddy/certs/caddy.pem /etc/caddy/certs/caddy-key.pem { client_auth { mode require_and_verify trusted_ca_cert_file /etc/caddy/certs/root-ca.pem } } reverse_proxy 192.168.2.3:8080 } ``` The volume mount in docker-compose must expose `./certs` into the container at `/etc/caddy/certs:ro` — Caddy cannot read cert files that aren't inside its filesystem namespace. ### 5.1 Certificate generation Run on the Caddy host (192.168.2.2): ```bash cd /mnt/ssdpool/@docker/caddy/certs openssl genrsa -out root-ca.key 4096 openssl req -new -x509 -days 3650 -key root-ca.key -out root-ca.pem -subj "/CN=ShahODin Root CA/O=ShahODin/C=DE" openssl genrsa -out client.key 4096 openssl req -new -key client.key -out client.csr -subj "/CN=ShahODin Client/O=ShahODin/C=DE" openssl x509 -req -in client.csr -CA root-ca.pem -CAkey root-ca.key -CAcreateserial -out client.crt -days 3650 ``` Bundle into a PKCS#12 for browser import. **Use `-legacy`** so NSS-based stores (Firefox, Chromium on Linux, Brave Flatpak) can read it — OpenSSL 3 defaults to PBES2/AES-256 which older parsers reject: ```bash openssl pkcs12 -legacy -export -out client-legacy.p12 -inkey client.key -in client.crt -certfile root-ca.pem -passout pass: ``` Files needed on each client: `client.crt` (as `client.pem`), `client.key` (as `client-key.pem`), `root-ca.pem`. For CLI usage copy them to `~/.pi/agent/certs/` on the client machine; the extension reads them from there. ## 6. Client-side — installing the extension ```bash # 1) Copy certs to the canonical client location mkdir -p ~/.pi/agent/certs scp user@caddy-host:/mnt/ssdpool/@docker/caddy/certs/client.crt ~/.pi/agent/certs/client.pem scp user@caddy-host:/mnt/ssdpool/@docker/caddy/certs/client.key ~/.pi/agent/certs/client-key.pem scp user@caddy-host:/mnt/ssdpool/@docker/caddy/certs/root-ca.pem ~/.pi/agent/certs/ # 2) Copy the extension directory scp -r user@source:~/.pi/agent/extensions/ai-server ~/.pi/agent/extensions/ # 3) Optionally configure SSH key auth to the AI server (for admin commands) ssh-copy-id ai-server@192.168.2.3 ``` Run `/reload` in pi — the extension loads, discovers models from the router, registers the `ai-server` provider, and installs the admin slash commands. ## 7. Slash commands | Command | Purpose | Transport | |---|---|---| | `/ai-server-status` | Tabular view of models, status, ctx size | HTTPS mTLS | | `/ai-server-refresh` | Re-discover models and re-register the provider | HTTPS mTLS | | `/ai-server-load ` | Warm-load a model via `/upstream//health` | HTTPS mTLS | | `/ai-server-unload ` | Unload a model via `/api/models/unload/` | HTTPS mTLS | | `/ai-server-ctx ` | Edit YAML config ctx-size, reload the model | SSH + HTTPS | | `/ai-server-preset` | Print the server's llama-swap config (YAML) | SSH | | `/ai-server-restart` | `systemctl --user restart llama-swap.service` | SSH | `` arguments tab-complete against the live router model list. ## 8. Adding a new model ```bash # On the AI server ssh ai-server@192.168.2.3 cd ~/models && hf download / --include '**' --local-dir . # Add a config block to ~/.config/llama-swap/config.yaml (see example in §4.3) ``` Then from pi: ``` /ai-server-refresh # discovers the new model /ai-server-load # first load may take a minute for a cold GGUF ``` No extension-side config changes are needed — discovery picks it up. ## 9. Browser access to the built-in web UI Navigate to `https://ai.shahondin1624.de/` in any browser that has the client cert and trusts the root CA. ### 9.1 Firefox (simplest path, always works) Firefox uses its own NSS trust exclusively. Import `client-legacy.p12` under *Preferences → Privacy & Security → Certificates → Your Certificates*, and `root-ca.pem` under *Authorities* with "trust to identify websites" checked. ### 9.2 Chromium / Brave Chromium on Linux now uses the bundled **Chrome Root Store** for server cert validation. Neither `/etc/pki/ca-trust/source/anchors/` (system trust) nor the user's `~/.pki/nssdb` alone are consulted for server cert chain verification in recent Brave/Chrome builds. Two workarounds: 1. **`brave://certificate-manager/` → Custom** (Chromium ≥137) — import `root-ca.pem` here and flag it as trusted for websites. This is the modern replacement for the removed `ChromeRootStoreEnabled` policy. 2. **Fallback: Firefox** — if the Custom tab isn't available or the feature is still buggy in a given build, use Firefox for the web UI. The mTLS client cert import path is straightforward there. Client-cert auth (mTLS handshake itself) still works via NSS even when server cert validation goes through CRS, so installing the client `.p12` into NSS is enough for handshake. Only the padlock/trust UI is affected by the CRS issue. ### 9.3 Brave Flatpak specifics The Brave Flatpak has its own isolated NSS database at `~/.var/app/com.brave.Browser/.pki/nssdb/`. Import directly into it: ```bash pk12util -d sql:$HOME/.var/app/com.brave.Browser/.pki/nssdb -i ~/client-legacy.p12 -W '' certutil -d sql:$HOME/.var/app/com.brave.Browser/.pki/nssdb -A -t "CT,C,C" -n "ShahODin Root CA" -i ~/root-ca.pem ``` To stop the "select a certificate" prompt on each page load, write a Brave enterprise policy: ```bash sudo flatpak override com.brave.Browser --filesystem=/etc/brave:ro sudo install -D -m 644 /path/to/policy.json /etc/brave/policies/managed/shahondin1624.json flatpak kill com.brave.Browser ``` Where `policy.json` contains: ```json { "AutoSelectCertificateForUrls": [ "{\"pattern\":\"https://ai.shahondin1624.de\",\"filter\":{\"ISSUER\":{\"CN\":\"ShahODin Root CA\"}}}" ] } ``` Verify under `brave://policy`. The policy must show status **OK**, not **Error** (an Error usually means the key has been renamed or removed upstream). ## 10. Troubleshooting | Symptom | Likely cause | Fix | |---|---|---| | pi: `HTTP 400: request exceeds available context size` | Model config has a small `--ctx-size` | Increase `--ctx-size` in the YAML config | | pi: `HTTP 400: File Not Found` on load | Wrong model id — check `/v1/models` | Use the exact id from the models list | | Model shows as `[unloaded]` in `/ai-server-status` | Model isn't currently loaded in llama-swap | Run `/ai-server-load ` to warm it | | First request is slow | Cold model load — no preload configured | Add `hooks.on_startup.preload: []` to config | | `certutil: unable to open …root-ca.pem` | CA file not yet scp'd locally | Copy `root-ca.pem` from the Caddy host | | Brave: p12 import "Invalid or corrupt file" | OpenSSL 3 default PBES2/AES-256 encryption | Regenerate with `openssl pkcs12 -legacy -export …` | | Brave: site loads but padlock is red | Chrome Root Store issue | Use `brave://certificate-manager/` → Custom | | Cert selection prompt appears on every page load | `AutoSelectCertificateForUrls` policy missing or malformed | See §9.3 | | System-trust update-ca-trust has no effect on Brave | Brave is a Flatpak; sandbox doesn't see host `/etc/pki/ca-trust` | Import directly into the sandbox's NSS DB (§9.3) | | Chat first-token latency seems long | Cold model load | First chat turn may wait 10–60s while the GGUF mmap's in | | `/ai-server-restart` fails | Wrong service unit name | Check `AI_SERVER_SERVICE_UNIT` / create the proper unit | | `/ai-server-ctx` fails | YAML format changed | Edit `~/.config/llama-swap/config.yaml` manually first | ## 11. Security notes - The client private key (`client.key` / `client-key.pem` / `client-legacy.p12`) is the sole credential for API access. Treat it like an SSH key — do not share, do not commit, do not email. - To revoke a client, regenerate the root CA's cert list and remove/rename the offending client cert file on Caddy. (Proper CRL/OCSP is not set up — this is a single-user deployment.) - The `apiKey: "ai-server-mtls"` string in `index.ts` is a placeholder required by the pi model registry; no bearer token is sent over the wire. All auth is cert-based. - Every admin slash command with a mutating side-effect (`ctx`, `restart`) is gated behind a `ctx.ui.confirm` dialog. ## 12. Paths reference ### On the AI server (192.168.2.3) | Path | Purpose | |---|---| | `~/llama.cpp/` | llama.cpp source + build tree | | `~/llama.cpp/build/bin/llama-server` | Binary (invoked by llama-swap) | | `~/models/*.gguf` | Model weights | | `~/.config/llama-swap/config.yaml` | llama-swap YAML config | | `~/.config/systemd/user/llama-swap.service` | Service unit | | `~/vram-monitor.sh` | Optional idle-unload cron helper | ### On the Caddy host (192.168.2.2) | Path | Purpose | |---|---| | `/mnt/ssdpool/@docker/caddy/Caddyfile` | Caddy config | | `/mnt/ssdpool/@docker/caddy/docker-compose.yml` | Caddy container definition | | `/mnt/ssdpool/@docker/caddy/certs/root-ca.pem` | Root CA (public) | | `/mnt/ssdpool/@docker/caddy/certs/root-ca.key` | Root CA private key (keep offline-ish) | | `/mnt/ssdpool/@docker/caddy/certs/caddy.pem` + `caddy-key.pem` | Server cert for `ai.shahondin1624.de` | | `/mnt/ssdpool/@docker/caddy/certs/client.crt` + `client.key` | Client cert/key | | `/mnt/ssdpool/@docker/caddy/certs/client-legacy.p12` | Browser-import bundle (legacy-encoded) | ### On each pi client | Path | Purpose | |---|---| | `~/.pi/agent/certs/client.pem` | Client cert | | `~/.pi/agent/certs/client-key.pem` | Client private key | | `~/.pi/agent/certs/root-ca.pem` | Root CA | | `~/.pi/agent/extensions/ai-server/` | This extension |