pi-extensions/ai-server/README.md

# ai-server — PI extension for a self-hosted llama-swap server behind mTLS

A multi-file pi extension that exposes a remote llama-swap instance as a
provider to pi, with dynamic model discovery and admin slash commands. Chat
streams use client-certificate TLS so the endpoint can be exposed over the
public internet without a bearer token.

---

## 1. Architecture

```
┌────────────┐    mTLS (HTTPS) ┌──────────────┐   HTTP  ┌─────────────────┐
│ pi client  │───────────────►│ Caddy        │────────►│ llama-swap       │
│ (this ext) │                │ 192.168.2.2  │         │ 192.168.2.3:8080 │
└────────────┘   client cert  │ ai.…         │         │ swap mode         │
                              └──────────────┘         │ globalTTL: 1800  │
                                                       │ scheduler: one   │
                                                       └─────────────────┘
                                                               │
                                                      ~/.config/llama-swap/config.yaml
                                                      (YAML model config)
```

- **Caddy** terminates TLS and enforces `require_and_verify` client-cert auth
  on `ai.shahondin1624.de`. Plaintext HTTP is forwarded to llama-swap.
- **llama-swap** runs in swap mode, managing model lifecycle (load/unload/swap)
  with a YAML config at `~/.config/llama-swap/config.yaml`.
- **This extension** performs OpenAI-compatible chat streaming over mTLS and
  surfaces admin endpoints as pi slash commands.

## 2. Extension layout

```
~/.pi/agent/extensions/ai-server/
├── index.ts       entry: async discovery + registerProvider + commands
├── config.ts      URLs, SSH host, cert paths, MODELS[] fallback
├── messages.ts    Context → OpenAI chat/completions messages
├── stream.ts      custom streamSimple: SSE parse, mTLS HTTPS, pi-ai events
├── admin.ts       router HTTP client + SSH helpers (YAML edit, systemctl)
└── README.md      this file
```

## 3. Environment variables

All are optional — the defaults match the current host.

| Env var | Default | Purpose |
|---|---|---|
| `AI_SERVER_URL` | `https://ai.shahondin1624.de` | Base URL of the Caddy endpoint |
| `AI_SERVER_CERTS_DIR` | `~/.pi/agent/certs` | Dir holding client cert + key + CA |
| `AI_SERVER_CA` | `<certs>/root-ca.pem` | CA file |
| `AI_SERVER_CLIENT_CERT` | `<certs>/client.pem` | Client cert |
| `AI_SERVER_CLIENT_KEY` | `<certs>/client-key.pem` | Client private key |
| `AI_SERVER_TIMEOUT_MS` | `300000` | Per-request stream timeout |
| `AI_SERVER_SSH_HOST` | `ai-server@192.168.2.3` | SSH target for admin commands |
| `AI_SERVER_PRESET_PATH` | `~/.config/llama-swap/config.yaml` | YAML config on the SSH target |
| `AI_SERVER_SERVICE_UNIT` | `llama-swap.service` | systemd unit name |
| `AI_SERVER_MODELS_PATH` | `/v1/models` | Models list endpoint |
| `AI_SERVER_RUNNING_PATH` | `/running` | Currently running models endpoint |
| `AI_SERVER_UNLOAD_PATH` | `/api/models/unload/<id>` | Unload single model |
| `AI_SERVER_UNLOAD_ALL_PATH` | `/api/models/unload` | Unload all models |
| `AI_SERVER_UPSTREAM_HEALTH_PATH` | `/upstream/<id>/health` | Warm-load / health endpoint |

## 4. Server-side setup (192.168.2.3)

### 4.1 llama-swap install

```bash
npm install -g llama-swap
# or use the binary release from the llama-swap GitHub repo
```

### 4.2 Model storage

```
~/models/<model-name>.gguf
```

### 4.3 Config file — `~/.config/llama-swap/config.yaml`

llama-swap uses a YAML config file. Each model is defined under `models:` with
a `cmd:` block containing the llama-server invocation.

```yaml
globalTTL: 1800
models:
  Qwen_Qwen3.6-35B-A3B-Q8_0:
    cmd: |
      /home/ai-server/llama.cpp/build/bin/llama-server
      --model /home/ai-server/models/Qwen_Qwen3.6-35B-A3B-Q8_0.gguf
      --ctx-size 262144
      --temp 0.7
      --cache-type-k q8_0
      --cache-type-v q8_0
      --n-gpu-layers 99

  MiniMax-M2.7-IQ3_XXS:
    cmd: |
      /home/ai-server/llama.cpp/build/bin/llama-server
      --model /home/ai-server/models/MiniMax-M2.7-UD-IQ3_XXS.gguf
      --ctx-size 131072
      --temp 1.0
      --cache-type-k q8_0
      --cache-type-v q8_0
      --n-gpu-layers 99
```

### 4.4 Systemd user service — `~/.config/systemd/user/llama-swap.service`

```ini
[Unit]
Description=LLaMA-swap AI Server (Swap Mode)
After=network.target
Wants=network.target

[Service]
Type=simple
User=ai-server
Group=ai-server
WorkingDirectory=/home/ai-server
ExecStart=/home/ai-server/node_modules/.bin/llama-swap \
    --host 0.0.0.0 \
    --port 8080 \
    --config /home/ai-server/.config/llama-swap/config.yaml

LimitNOFILE=65536
LimitMEMLOCK=unlimited
LimitMEMLOCK_BYTES=107374182400

Restart=on-failure
RestartSec=5
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=default.target
```

Enable and start:

```bash
systemctl --user daemon-reload && systemctl --user enable --now llama-swap.service
loginctl enable-linger $(whoami)   # keep user services running across logouts
```

### 4.5 Router HTTP API (reference)

| Method | Path | Body | Notes |
|---|---|---|---|
| `GET`  | `/v1/models` | — | List models; `{"data":[{id,object,created,owned_by}]}` |
| `GET`  | `/running` | — | Currently loaded models; `{"running":[{id,...}]}` |
| `POST` | `/api/models/unload` | — | Unload all models; returns `{"msg":"ok"}` |
| `POST` | `/api/models/unload/<id>` | — | Unload specific model; plain text `OK` |
| `GET`  | `/upstream/<id>/health` | — | Warm-load model (forces spawn without inference) |
| `GET`  | `/health` | — | Plain text `OK` (not JSON) |
| `POST` | `/v1/chat/completions` | OpenAI Chat Completions payload | What pi and the web UI use |

> **Note:** Response bodies are mixed JSON and plain text. The extension's
> `routerRequest()` falls back to `{raw: buf}` for non-JSON responses, so
> unload calls won't crash — they'll return `{raw: "OK"}`.

## 5. Caddy + mTLS setup (192.168.2.2)

Caddy config lives at `/mnt/ssdpool/@docker/caddy/` (Caddyfile, docker-compose,
certs). The domain `ai.shahondin1624.de` is configured with strict mTLS:

```caddy
ai.shahondin1624.de {
    tls /etc/caddy/certs/caddy.pem /etc/caddy/certs/caddy-key.pem {
        client_auth {
            mode require_and_verify
            trusted_ca_cert_file /etc/caddy/certs/root-ca.pem
        }
    }
    reverse_proxy 192.168.2.3:8080
}
```

The volume mount in docker-compose must expose `./certs` into the container at
`/etc/caddy/certs:ro` — Caddy cannot read cert files that aren't inside its
filesystem namespace.

### 5.1 Certificate generation

Run on the Caddy host (192.168.2.2):

```bash
cd /mnt/ssdpool/@docker/caddy/certs
openssl genrsa -out root-ca.key 4096
openssl req -new -x509 -days 3650 -key root-ca.key -out root-ca.pem -subj "/CN=ShahODin Root CA/O=ShahODin/C=DE"
openssl genrsa -out client.key 4096
openssl req -new -key client.key -out client.csr -subj "/CN=ShahODin Client/O=ShahODin/C=DE"
openssl x509 -req -in client.csr -CA root-ca.pem -CAkey root-ca.key -CAcreateserial -out client.crt -days 3650
```

Bundle into a PKCS#12 for browser import. **Use `-legacy`** so NSS-based stores
(Firefox, Chromium on Linux, Brave Flatpak) can read it — OpenSSL 3 defaults to
PBES2/AES-256 which older parsers reject:

```bash
openssl pkcs12 -legacy -export -out client-legacy.p12 -inkey client.key -in client.crt -certfile root-ca.pem -passout pass:
```

Files needed on each client: `client.crt` (as `client.pem`), `client.key` (as
`client-key.pem`), `root-ca.pem`. For CLI usage copy them to `~/.pi/agent/certs/`
on the client machine; the extension reads them from there.

## 6. Client-side — installing the extension

```bash
# 1) Copy certs to the canonical client location
mkdir -p ~/.pi/agent/certs
scp user@caddy-host:/mnt/ssdpool/@docker/caddy/certs/client.crt ~/.pi/agent/certs/client.pem
scp user@caddy-host:/mnt/ssdpool/@docker/caddy/certs/client.key ~/.pi/agent/certs/client-key.pem
scp user@caddy-host:/mnt/ssdpool/@docker/caddy/certs/root-ca.pem ~/.pi/agent/certs/

# 2) Copy the extension directory
scp -r user@source:~/.pi/agent/extensions/ai-server ~/.pi/agent/extensions/

# 3) Optionally configure SSH key auth to the AI server (for admin commands)
ssh-copy-id ai-server@192.168.2.3
```

Run `/reload` in pi — the extension loads, discovers models from the router,
registers the `ai-server` provider, and installs the admin slash commands.

## 7. Slash commands

| Command | Purpose | Transport |
|---|---|---|
| `/ai-server-status` | Tabular view of models, status, ctx size | HTTPS mTLS |
| `/ai-server-refresh` | Re-discover models and re-register the provider | HTTPS mTLS |
| `/ai-server-load <id>` | Warm-load a model via `/upstream/<id>/health` | HTTPS mTLS |
| `/ai-server-unload <id>` | Unload a model via `/api/models/unload/<id>` | HTTPS mTLS |
| `/ai-server-ctx <id> <size>` | Edit YAML config ctx-size, reload the model | SSH + HTTPS |
| `/ai-server-preset` | Print the server's llama-swap config (YAML) | SSH |
| `/ai-server-restart` | `systemctl --user restart llama-swap.service` | SSH |

`<id>` arguments tab-complete against the live router model list.

## 8. Adding a new model

```bash
# On the AI server
ssh ai-server@192.168.2.3
cd ~/models && hf download <author>/<repo> --include '*<quant>*' --local-dir .

# Add a config block to ~/.config/llama-swap/config.yaml (see example in §4.3)
```

Then from pi:

```
/ai-server-refresh      # discovers the new model
/ai-server-load <id>    # first load may take a minute for a cold GGUF
```

No extension-side config changes are needed — discovery picks it up.

## 9. Browser access to the built-in web UI

Navigate to `https://ai.shahondin1624.de/` in any browser that has the client
cert and trusts the root CA.

### 9.1 Firefox (simplest path, always works)

Firefox uses its own NSS trust exclusively. Import `client-legacy.p12` under
*Preferences → Privacy & Security → Certificates → Your Certificates*, and
`root-ca.pem` under *Authorities* with "trust to identify websites" checked.

### 9.2 Chromium / Brave

Chromium on Linux now uses the bundled **Chrome Root Store** for server cert
validation. Neither `/etc/pki/ca-trust/source/anchors/` (system trust) nor the
user's `~/.pki/nssdb` alone are consulted for server cert chain verification in
recent Brave/Chrome builds. Two workarounds:

1. **`brave://certificate-manager/` → Custom** (Chromium ≥137) — import
   `root-ca.pem` here and flag it as trusted for websites. This is the modern
   replacement for the removed `ChromeRootStoreEnabled` policy.
2. **Fallback: Firefox** — if the Custom tab isn't available or the feature is
   still buggy in a given build, use Firefox for the web UI. The mTLS client
   cert import path is straightforward there.

Client-cert auth (mTLS handshake itself) still works via NSS even when server
cert validation goes through CRS, so installing the client `.p12` into NSS is
enough for handshake. Only the padlock/trust UI is affected by the CRS issue.

### 9.3 Brave Flatpak specifics

The Brave Flatpak has its own isolated NSS database at
`~/.var/app/com.brave.Browser/.pki/nssdb/`. Import directly into it:

```bash
pk12util -d sql:$HOME/.var/app/com.brave.Browser/.pki/nssdb -i ~/client-legacy.p12 -W ''
certutil -d sql:$HOME/.var/app/com.brave.Browser/.pki/nssdb -A -t "CT,C,C" -n "ShahODin Root CA" -i ~/root-ca.pem
```

To stop the "select a certificate" prompt on each page load, write a Brave
enterprise policy:

```bash
sudo flatpak override com.brave.Browser --filesystem=/etc/brave:ro
sudo install -D -m 644 /path/to/policy.json /etc/brave/policies/managed/shahondin1624.json
flatpak kill com.brave.Browser
```

Where `policy.json` contains:

```json
{
  "AutoSelectCertificateForUrls": [
    "{\"pattern\":\"https://ai.shahondin1624.de\",\"filter\":{\"ISSUER\":{\"CN\":\"ShahODin Root CA\"}}}"
  ]
}
```

Verify under `brave://policy`. The policy must show status **OK**, not
**Error** (an Error usually means the key has been renamed or removed upstream).

## 10. Troubleshooting

| Symptom | Likely cause | Fix |
|---|---|---|
| pi: `HTTP 400: request exceeds available context size` | Model config has a small `--ctx-size` | Increase `--ctx-size` in the YAML config |
| pi: `HTTP 400: File Not Found` on load | Wrong model id — check `/v1/models` | Use the exact id from the models list |
| Model shows as `[unloaded]` in `/ai-server-status` | Model isn't currently loaded in llama-swap | Run `/ai-server-load <id>` to warm it |
| First request is slow | Cold model load — no preload configured | Add `hooks.on_startup.preload: [<id>]` to config |
| `certutil: unable to open …root-ca.pem` | CA file not yet scp'd locally | Copy `root-ca.pem` from the Caddy host |
| Brave: p12 import "Invalid or corrupt file" | OpenSSL 3 default PBES2/AES-256 encryption | Regenerate with `openssl pkcs12 -legacy -export …` |
| Brave: site loads but padlock is red | Chrome Root Store issue | Use `brave://certificate-manager/` → Custom |
| Cert selection prompt appears on every page load | `AutoSelectCertificateForUrls` policy missing or malformed | See §9.3 |
| System-trust update-ca-trust has no effect on Brave | Brave is a Flatpak; sandbox doesn't see host `/etc/pki/ca-trust` | Import directly into the sandbox's NSS DB (§9.3) |
| Chat first-token latency seems long | Cold model load | First chat turn may wait 10–60s while the GGUF mmap's in |
| `/ai-server-restart` fails | Wrong service unit name | Check `AI_SERVER_SERVICE_UNIT` / create the proper unit |
| `/ai-server-ctx` fails | YAML format changed | Edit `~/.config/llama-swap/config.yaml` manually first |

## 11. Security notes

- The client private key (`client.key` / `client-key.pem` / `client-legacy.p12`)
  is the sole credential for API access. Treat it like an SSH key — do not
  share, do not commit, do not email.
- To revoke a client, regenerate the root CA's cert list and remove/rename the
  offending client cert file on Caddy. (Proper CRL/OCSP is not set up — this
  is a single-user deployment.)
- The `apiKey: "ai-server-mtls"` string in `index.ts` is a placeholder required
  by the pi model registry; no bearer token is sent over the wire. All auth is
  cert-based.
- Every admin slash command with a mutating side-effect (`ctx`, `restart`) is
  gated behind a `ctx.ui.confirm` dialog.

## 12. Paths reference

### On the AI server (192.168.2.3)

| Path | Purpose |
|---|---|
| `~/llama.cpp/` | llama.cpp source + build tree |
| `~/llama.cpp/build/bin/llama-server` | Binary (invoked by llama-swap) |
| `~/models/*.gguf` | Model weights |
| `~/.config/llama-swap/config.yaml` | llama-swap YAML config |
| `~/.config/systemd/user/llama-swap.service` | Service unit |
| `~/vram-monitor.sh` | Optional idle-unload cron helper |

### On the Caddy host (192.168.2.2)

| Path | Purpose |
|---|---|
| `/mnt/ssdpool/@docker/caddy/Caddyfile` | Caddy config |
| `/mnt/ssdpool/@docker/caddy/docker-compose.yml` | Caddy container definition |
| `/mnt/ssdpool/@docker/caddy/certs/root-ca.pem` | Root CA (public) |
| `/mnt/ssdpool/@docker/caddy/certs/root-ca.key` | Root CA private key (keep offline-ish) |
| `/mnt/ssdpool/@docker/caddy/certs/caddy.pem` + `caddy-key.pem` | Server cert for `ai.shahondin1624.de` |
| `/mnt/ssdpool/@docker/caddy/certs/client.crt` + `client.key` | Client cert/key |
| `/mnt/ssdpool/@docker/caddy/certs/client-legacy.p12` | Browser-import bundle (legacy-encoded) |

### On each pi client

| Path | Purpose |
|---|---|
| `~/.pi/agent/certs/client.pem` | Client cert |
| `~/.pi/agent/certs/client-key.pem` | Client private key |
| `~/.pi/agent/certs/root-ca.pem` | Root CA |
| `~/.pi/agent/extensions/ai-server/` | This extension |