Merge pull request 'docs: multi-machine deployment guide (#99)' (#209) from feature/issue-99-deployment-guide into main

This commit was merged in pull request #209.
This commit is contained in:
2026-03-11 11:01:17 +01:00
3 changed files with 400 additions and 0 deletions

361
docker/DEPLOYMENT.md Normal file
View File

@@ -0,0 +1,361 @@
# Multi-Machine Deployment Guide
This guide covers deploying llm-multiverse across multiple machines using
Docker Swarm with encrypted overlay networking.
## Prerequisites
### Hardware
| Node | Role | Specs | Purpose |
|---|---|---|---|
| GPU machine | Manager | AMD Ryzen 7 2700x, 64GB DDR4, AMD RX 9070 XT (16GB VRAM) | Model inference, Ollama, D-Bus/keyring |
| Server | Worker | 32GB DDR3, CPU only | Orchestration, memory, audit, search |
### Software
Both machines need:
- Docker Engine 24.0+ with Swarm support
- Docker Compose v2 (for local testing)
- Open ports between nodes: TCP 2377 (management), TCP/UDP 7946 (node communication), UDP 4789 (overlay traffic)
GPU machine additionally needs:
- Ollama installed and running on the host (not in Docker)
- D-Bus session bus accessible (for GNOME Keyring / KeePassXC)
### Network
- Both machines must be on the same network (or have routable IPs)
- Firewall rules must allow Docker Swarm ports (see above)
- DNS or static IPs for node addressing
## Deployment Steps
### 1. Install Docker on Both Machines
```bash
# Debian/Ubuntu
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
# Log out and back in
```
### 2. Install Ollama on GPU Machine
```bash
curl -fsSL https://ollama.com/install.sh | sh
ollama pull mistral # or your preferred model
ollama pull nomic-embed-text # for embeddings
```
Verify Ollama is running:
```bash
curl http://localhost:11434/api/tags
```
### 3. Build and Push Images
On the build machine (or CI):
```bash
# Build all images
docker compose -f docker/docker-compose.yml build
# Tag for your registry
REGISTRY=your-registry.example.com/
for svc in audit secrets memory model-gateway tool-broker search orchestrator; do
docker tag "docker-${svc}" "${REGISTRY}llm-multiverse/${svc}:latest"
docker push "${REGISTRY}llm-multiverse/${svc}:latest"
done
```
Alternatively, build on each node or use `docker save`/`docker load` for
air-gapped environments.
### 4. Initialize Docker Swarm
On the GPU machine (manager):
```bash
bash docker/scripts/swarm-init.sh init
```
This initializes the swarm and creates the encrypted overlay network
(`llm-internal`).
Note the join command printed in the output.
### 5. Join Worker Node
On the server machine:
```bash
bash docker/scripts/swarm-init.sh join <manager-ip> <join-token>
```
To retrieve the join token later (run on manager):
```bash
bash docker/scripts/swarm-init.sh token
```
### 6. Label Nodes
On the manager:
```bash
# Label the GPU machine
bash docker/scripts/label-nodes.sh gpu $(hostname)
# Label the server (use its hostname as shown in `docker node ls`)
bash docker/scripts/label-nodes.sh server <server-hostname>
# Verify labels
bash docker/scripts/label-nodes.sh show
```
### 7. Deploy the Stack
```bash
# Set your registry prefix (if using a registry)
export REGISTRY=your-registry.example.com/
export IMAGE_TAG=latest
# Deploy
docker stack deploy -c docker/docker-stack.yml llm
```
### 8. Verify Deployment
```bash
# Check all services are running
docker stack services llm
# Check service placement
docker stack ps llm
# Verify swarm and network
bash docker/scripts/swarm-init.sh verify
```
Wait for all services to reach `Running` state, then test connectivity:
```bash
# From the manager node
bash docker/scripts/verify-connectivity.sh
```
### 9. Validate Zero Code Changes
Confirm that no service source code was modified for multi-machine deployment:
```bash
bash docker/scripts/validate-zero-changes.sh
```
## Service Placement
Services are placed on nodes based on hardware requirements:
| Service | Node | Reason |
|---|---|---|
| Model Gateway | GPU machine | Needs Ollama via `host.docker.internal` |
| Secrets | GPU machine | Needs D-Bus socket for host keyring |
| Audit | Server | Persistent storage for append-only logs |
| Memory | Server | Persistent storage for DuckDB |
| Orchestrator | Any | CPU-only coordination |
| Tool Broker | Any | CPU-only enforcement |
| Search | Any | CPU-only I/O-bound |
| SearXNG | Any | CPU-only |
| Caddy | Any | Edge proxy |
See [SERVICE_TOPOLOGY.md](SERVICE_TOPOLOGY.md) for the full connection matrix.
## Configuration
### Environment Variables
Set these before `docker stack deploy`:
| Variable | Default | Purpose |
|---|---|---|
| `REGISTRY` | _(empty)_ | Image registry prefix (e.g., `registry.example.com/`) |
| `IMAGE_TAG` | `latest` | Image tag for all services |
| `DOMAIN` | `localhost` | Caddy domain (real domain enables Let's Encrypt) |
| `TLS_MODE` | `internal` | Caddy TLS (`internal` = self-signed) |
| `HTTPS_PORT` | `443` | Host HTTPS port |
| `HTTP_PORT` | `80` | Host HTTP port |
| `DBUS_SESSION_SOCKET` | `/run/user/1000/bus` | Host D-Bus session socket |
| `SEARXNG_SECRET` | `dev-secret-...` | SearXNG instance secret |
| `*_REPLICAS` | `1` | Per-service replica count |
### Scaling Services
Unconstrained services can be scaled:
```bash
# Scale orchestrator to 2 replicas
docker service scale llm_orchestrator=2
# Or set via environment before deploy
export ORCHESTRATOR_REPLICAS=2
docker stack deploy -c docker/docker-stack.yml llm
```
Services with placement constraints (model-gateway, secrets, audit, memory)
should remain at 1 replica unless you configure shared storage.
## Monitoring and Logs
### View Service Logs
```bash
# All services
docker stack services llm
docker service logs llm_orchestrator --follow
# Specific service
docker service logs llm_model-gateway --tail 100
```
### Aggregate Logs
For production, consider adding a log driver to the stack:
```yaml
# In docker-stack.yml, add to each service:
logging:
driver: json-file
options:
max-size: "10m"
max-file: "3"
```
Or use a centralized logging stack (Loki + Promtail, ELK, etc.).
### Health Monitoring
- Caddy health: `curl -sk https://<domain>/healthz`
- SearXNG health: Check via `docker service logs llm_searxng`
- All services have Docker health checks — check with `docker stack ps llm`
## Security Considerations
### Encrypted Overlay Network
All inter-node traffic is encrypted via IPsec (Docker Swarm `--opt encrypted`).
This replaces mTLS between services — Docker handles key exchange automatically.
Verify encryption is enabled:
```bash
docker network inspect llm-internal --format '{{.Options}}'
# Should show: map[encrypted:]
```
### D-Bus Socket Exposure
Only the Secrets service container has access to the host D-Bus socket.
This is necessary for GNOME Keyring / KeePassXC integration. The mount
is read-only.
If the D-Bus socket is unavailable, the Secrets service falls back to
the Linux kernel keyring.
### External Access
- Only Caddy is exposed externally (ports 80/443)
- All internal services are on the overlay network only
- Caddy terminates TLS at the edge; internal traffic uses h2c (HTTP/2 cleartext)
- For production: set `DOMAIN` to your real domain for automatic Let's Encrypt
### Secrets Management
- Never commit secrets to the repository
- Use `SEARXNG_SECRET` environment variable (change from default in production)
- Service API keys are managed by the Secrets service via the host keyring
## Troubleshooting
### Services Not Starting
```bash
# Check service status
docker stack ps llm --no-trunc
# Common issues:
# - "no suitable node" → Check node labels match placement constraints
# - "image not found" → Ensure images are pushed to registry or loaded on all nodes
# - "port already in use" → Another service is using port 443/80
```
### Node Labels Missing
```bash
# Check labels
bash docker/scripts/label-nodes.sh show
# Re-apply if needed
bash docker/scripts/label-nodes.sh gpu <node>
bash docker/scripts/label-nodes.sh server <node>
```
### Overlay Network Issues
```bash
# Verify network exists and is encrypted
docker network inspect llm-internal
# If missing, recreate
bash docker/scripts/swarm-init.sh network
```
### Ollama Not Reachable
The Model Gateway connects to Ollama via `host.docker.internal:11434`.
This requires:
1. Ollama is running on the GPU host: `systemctl status ollama`
2. Ollama is listening on all interfaces or localhost
3. The Model Gateway is scheduled on the GPU node (check placement)
```bash
# Verify from inside the container
docker exec $(docker ps -q -f name=llm_model-gateway) \
wget -qO- http://host.docker.internal:11434/api/tags
```
### Cross-Node Communication Fails
1. Check firewall rules allow Swarm ports (2377, 7946, 4789)
2. Check overlay network is encrypted: `docker network inspect llm-internal`
3. Run connectivity verification: `bash docker/scripts/verify-connectivity.sh`
### Removing the Stack
```bash
# Remove all services
docker stack rm llm
# Leave swarm (on worker)
bash docker/scripts/swarm-init.sh leave
# Leave swarm (on manager, destroys swarm)
bash docker/scripts/swarm-init.sh leave
```
## Single-Machine Fallback
To run on a single machine without Swarm (development/testing):
```bash
docker compose -f docker/docker-compose.yml build
docker compose -f docker/docker-compose.yml up -d
bash docker/scripts/verify-connectivity.sh
```
This uses the bridge network instead of overlay and does not require
swarm initialization or node labeling.

View File

@@ -102,6 +102,7 @@
| #96 | Define service placement constraints | Phase 12 | `COMPLETED` | Shell / Markdown | [issue-096.md](issue-096.md) |
| #97 | Convert docker-compose.yml to Swarm stack | Phase 12 | `COMPLETED` | Docker / YAML | [issue-097.md](issue-097.md) |
| #98 | Validate zero service code changes | Phase 12 | `COMPLETED` | Shell | [issue-098.md](issue-098.md) |
| #99 | Document multi-machine deployment guide | Phase 12 | `COMPLETED` | Markdown | [issue-099.md](issue-099.md) |
## Status Legend

View File

@@ -0,0 +1,38 @@
# Issue #99: Document multi-machine deployment guide
## Metadata
| Field | Value |
|---|---|
| Issue | #99 |
| Title | Document multi-machine deployment guide |
| Milestone | Phase 12: Multi-Machine Extension |
| Status | `COMPLETED` |
| Language | Markdown |
| Related Plans | issue-095.md, issue-096.md, issue-097.md, issue-098.md |
| Blocked by | #98 |
## Acceptance Criteria
- [x] Prerequisites: hardware requirements, OS setup, Docker installation
- [x] Swarm initialization walkthrough
- [x] Node labeling guide
- [x] Stack deployment step-by-step
- [x] Monitoring and log aggregation setup
- [x] Troubleshooting common issues
- [x] Network topology diagram
- [x] Security considerations (encrypted overlay, D-Bus exposure, secrets)
## Files Created/Modified
| File | Action | Purpose |
|---|---|---|
| `docker/DEPLOYMENT.md` | Create | Comprehensive multi-machine deployment guide |
| `implementation-plans/issue-099.md` | Create | Plan |
| `implementation-plans/_index.md` | Modify | Index entry |
## Deviation Log
| Deviation | Reason |
|---|---|
| None | — |