Merge pull request 'docs: multi-machine deployment guide (#99)' (#209) from feature/issue-99-deployment-guide into main
This commit was merged in pull request #209.
This commit is contained in:
361
docker/DEPLOYMENT.md
Normal file
361
docker/DEPLOYMENT.md
Normal file
@@ -0,0 +1,361 @@
|
||||
# Multi-Machine Deployment Guide
|
||||
|
||||
This guide covers deploying llm-multiverse across multiple machines using
|
||||
Docker Swarm with encrypted overlay networking.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### Hardware
|
||||
|
||||
| Node | Role | Specs | Purpose |
|
||||
|---|---|---|---|
|
||||
| GPU machine | Manager | AMD Ryzen 7 2700x, 64GB DDR4, AMD RX 9070 XT (16GB VRAM) | Model inference, Ollama, D-Bus/keyring |
|
||||
| Server | Worker | 32GB DDR3, CPU only | Orchestration, memory, audit, search |
|
||||
|
||||
### Software
|
||||
|
||||
Both machines need:
|
||||
|
||||
- Docker Engine 24.0+ with Swarm support
|
||||
- Docker Compose v2 (for local testing)
|
||||
- Open ports between nodes: TCP 2377 (management), TCP/UDP 7946 (node communication), UDP 4789 (overlay traffic)
|
||||
|
||||
GPU machine additionally needs:
|
||||
|
||||
- Ollama installed and running on the host (not in Docker)
|
||||
- D-Bus session bus accessible (for GNOME Keyring / KeePassXC)
|
||||
|
||||
### Network
|
||||
|
||||
- Both machines must be on the same network (or have routable IPs)
|
||||
- Firewall rules must allow Docker Swarm ports (see above)
|
||||
- DNS or static IPs for node addressing
|
||||
|
||||
## Deployment Steps
|
||||
|
||||
### 1. Install Docker on Both Machines
|
||||
|
||||
```bash
|
||||
# Debian/Ubuntu
|
||||
curl -fsSL https://get.docker.com | sh
|
||||
sudo usermod -aG docker $USER
|
||||
# Log out and back in
|
||||
```
|
||||
|
||||
### 2. Install Ollama on GPU Machine
|
||||
|
||||
```bash
|
||||
curl -fsSL https://ollama.com/install.sh | sh
|
||||
ollama pull mistral # or your preferred model
|
||||
ollama pull nomic-embed-text # for embeddings
|
||||
```
|
||||
|
||||
Verify Ollama is running:
|
||||
|
||||
```bash
|
||||
curl http://localhost:11434/api/tags
|
||||
```
|
||||
|
||||
### 3. Build and Push Images
|
||||
|
||||
On the build machine (or CI):
|
||||
|
||||
```bash
|
||||
# Build all images
|
||||
docker compose -f docker/docker-compose.yml build
|
||||
|
||||
# Tag for your registry
|
||||
REGISTRY=your-registry.example.com/
|
||||
for svc in audit secrets memory model-gateway tool-broker search orchestrator; do
|
||||
docker tag "docker-${svc}" "${REGISTRY}llm-multiverse/${svc}:latest"
|
||||
docker push "${REGISTRY}llm-multiverse/${svc}:latest"
|
||||
done
|
||||
```
|
||||
|
||||
Alternatively, build on each node or use `docker save`/`docker load` for
|
||||
air-gapped environments.
|
||||
|
||||
### 4. Initialize Docker Swarm
|
||||
|
||||
On the GPU machine (manager):
|
||||
|
||||
```bash
|
||||
bash docker/scripts/swarm-init.sh init
|
||||
```
|
||||
|
||||
This initializes the swarm and creates the encrypted overlay network
|
||||
(`llm-internal`).
|
||||
|
||||
Note the join command printed in the output.
|
||||
|
||||
### 5. Join Worker Node
|
||||
|
||||
On the server machine:
|
||||
|
||||
```bash
|
||||
bash docker/scripts/swarm-init.sh join <manager-ip> <join-token>
|
||||
```
|
||||
|
||||
To retrieve the join token later (run on manager):
|
||||
|
||||
```bash
|
||||
bash docker/scripts/swarm-init.sh token
|
||||
```
|
||||
|
||||
### 6. Label Nodes
|
||||
|
||||
On the manager:
|
||||
|
||||
```bash
|
||||
# Label the GPU machine
|
||||
bash docker/scripts/label-nodes.sh gpu $(hostname)
|
||||
|
||||
# Label the server (use its hostname as shown in `docker node ls`)
|
||||
bash docker/scripts/label-nodes.sh server <server-hostname>
|
||||
|
||||
# Verify labels
|
||||
bash docker/scripts/label-nodes.sh show
|
||||
```
|
||||
|
||||
### 7. Deploy the Stack
|
||||
|
||||
```bash
|
||||
# Set your registry prefix (if using a registry)
|
||||
export REGISTRY=your-registry.example.com/
|
||||
export IMAGE_TAG=latest
|
||||
|
||||
# Deploy
|
||||
docker stack deploy -c docker/docker-stack.yml llm
|
||||
```
|
||||
|
||||
### 8. Verify Deployment
|
||||
|
||||
```bash
|
||||
# Check all services are running
|
||||
docker stack services llm
|
||||
|
||||
# Check service placement
|
||||
docker stack ps llm
|
||||
|
||||
# Verify swarm and network
|
||||
bash docker/scripts/swarm-init.sh verify
|
||||
```
|
||||
|
||||
Wait for all services to reach `Running` state, then test connectivity:
|
||||
|
||||
```bash
|
||||
# From the manager node
|
||||
bash docker/scripts/verify-connectivity.sh
|
||||
```
|
||||
|
||||
### 9. Validate Zero Code Changes
|
||||
|
||||
Confirm that no service source code was modified for multi-machine deployment:
|
||||
|
||||
```bash
|
||||
bash docker/scripts/validate-zero-changes.sh
|
||||
```
|
||||
|
||||
## Service Placement
|
||||
|
||||
Services are placed on nodes based on hardware requirements:
|
||||
|
||||
| Service | Node | Reason |
|
||||
|---|---|---|
|
||||
| Model Gateway | GPU machine | Needs Ollama via `host.docker.internal` |
|
||||
| Secrets | GPU machine | Needs D-Bus socket for host keyring |
|
||||
| Audit | Server | Persistent storage for append-only logs |
|
||||
| Memory | Server | Persistent storage for DuckDB |
|
||||
| Orchestrator | Any | CPU-only coordination |
|
||||
| Tool Broker | Any | CPU-only enforcement |
|
||||
| Search | Any | CPU-only I/O-bound |
|
||||
| SearXNG | Any | CPU-only |
|
||||
| Caddy | Any | Edge proxy |
|
||||
|
||||
See [SERVICE_TOPOLOGY.md](SERVICE_TOPOLOGY.md) for the full connection matrix.
|
||||
|
||||
## Configuration
|
||||
|
||||
### Environment Variables
|
||||
|
||||
Set these before `docker stack deploy`:
|
||||
|
||||
| Variable | Default | Purpose |
|
||||
|---|---|---|
|
||||
| `REGISTRY` | _(empty)_ | Image registry prefix (e.g., `registry.example.com/`) |
|
||||
| `IMAGE_TAG` | `latest` | Image tag for all services |
|
||||
| `DOMAIN` | `localhost` | Caddy domain (real domain enables Let's Encrypt) |
|
||||
| `TLS_MODE` | `internal` | Caddy TLS (`internal` = self-signed) |
|
||||
| `HTTPS_PORT` | `443` | Host HTTPS port |
|
||||
| `HTTP_PORT` | `80` | Host HTTP port |
|
||||
| `DBUS_SESSION_SOCKET` | `/run/user/1000/bus` | Host D-Bus session socket |
|
||||
| `SEARXNG_SECRET` | `dev-secret-...` | SearXNG instance secret |
|
||||
| `*_REPLICAS` | `1` | Per-service replica count |
|
||||
|
||||
### Scaling Services
|
||||
|
||||
Unconstrained services can be scaled:
|
||||
|
||||
```bash
|
||||
# Scale orchestrator to 2 replicas
|
||||
docker service scale llm_orchestrator=2
|
||||
|
||||
# Or set via environment before deploy
|
||||
export ORCHESTRATOR_REPLICAS=2
|
||||
docker stack deploy -c docker/docker-stack.yml llm
|
||||
```
|
||||
|
||||
Services with placement constraints (model-gateway, secrets, audit, memory)
|
||||
should remain at 1 replica unless you configure shared storage.
|
||||
|
||||
## Monitoring and Logs
|
||||
|
||||
### View Service Logs
|
||||
|
||||
```bash
|
||||
# All services
|
||||
docker stack services llm
|
||||
docker service logs llm_orchestrator --follow
|
||||
|
||||
# Specific service
|
||||
docker service logs llm_model-gateway --tail 100
|
||||
```
|
||||
|
||||
### Aggregate Logs
|
||||
|
||||
For production, consider adding a log driver to the stack:
|
||||
|
||||
```yaml
|
||||
# In docker-stack.yml, add to each service:
|
||||
logging:
|
||||
driver: json-file
|
||||
options:
|
||||
max-size: "10m"
|
||||
max-file: "3"
|
||||
```
|
||||
|
||||
Or use a centralized logging stack (Loki + Promtail, ELK, etc.).
|
||||
|
||||
### Health Monitoring
|
||||
|
||||
- Caddy health: `curl -sk https://<domain>/healthz`
|
||||
- SearXNG health: Check via `docker service logs llm_searxng`
|
||||
- All services have Docker health checks — check with `docker stack ps llm`
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Encrypted Overlay Network
|
||||
|
||||
All inter-node traffic is encrypted via IPsec (Docker Swarm `--opt encrypted`).
|
||||
This replaces mTLS between services — Docker handles key exchange automatically.
|
||||
|
||||
Verify encryption is enabled:
|
||||
|
||||
```bash
|
||||
docker network inspect llm-internal --format '{{.Options}}'
|
||||
# Should show: map[encrypted:]
|
||||
```
|
||||
|
||||
### D-Bus Socket Exposure
|
||||
|
||||
Only the Secrets service container has access to the host D-Bus socket.
|
||||
This is necessary for GNOME Keyring / KeePassXC integration. The mount
|
||||
is read-only.
|
||||
|
||||
If the D-Bus socket is unavailable, the Secrets service falls back to
|
||||
the Linux kernel keyring.
|
||||
|
||||
### External Access
|
||||
|
||||
- Only Caddy is exposed externally (ports 80/443)
|
||||
- All internal services are on the overlay network only
|
||||
- Caddy terminates TLS at the edge; internal traffic uses h2c (HTTP/2 cleartext)
|
||||
- For production: set `DOMAIN` to your real domain for automatic Let's Encrypt
|
||||
|
||||
### Secrets Management
|
||||
|
||||
- Never commit secrets to the repository
|
||||
- Use `SEARXNG_SECRET` environment variable (change from default in production)
|
||||
- Service API keys are managed by the Secrets service via the host keyring
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Services Not Starting
|
||||
|
||||
```bash
|
||||
# Check service status
|
||||
docker stack ps llm --no-trunc
|
||||
|
||||
# Common issues:
|
||||
# - "no suitable node" → Check node labels match placement constraints
|
||||
# - "image not found" → Ensure images are pushed to registry or loaded on all nodes
|
||||
# - "port already in use" → Another service is using port 443/80
|
||||
```
|
||||
|
||||
### Node Labels Missing
|
||||
|
||||
```bash
|
||||
# Check labels
|
||||
bash docker/scripts/label-nodes.sh show
|
||||
|
||||
# Re-apply if needed
|
||||
bash docker/scripts/label-nodes.sh gpu <node>
|
||||
bash docker/scripts/label-nodes.sh server <node>
|
||||
```
|
||||
|
||||
### Overlay Network Issues
|
||||
|
||||
```bash
|
||||
# Verify network exists and is encrypted
|
||||
docker network inspect llm-internal
|
||||
|
||||
# If missing, recreate
|
||||
bash docker/scripts/swarm-init.sh network
|
||||
```
|
||||
|
||||
### Ollama Not Reachable
|
||||
|
||||
The Model Gateway connects to Ollama via `host.docker.internal:11434`.
|
||||
This requires:
|
||||
|
||||
1. Ollama is running on the GPU host: `systemctl status ollama`
|
||||
2. Ollama is listening on all interfaces or localhost
|
||||
3. The Model Gateway is scheduled on the GPU node (check placement)
|
||||
|
||||
```bash
|
||||
# Verify from inside the container
|
||||
docker exec $(docker ps -q -f name=llm_model-gateway) \
|
||||
wget -qO- http://host.docker.internal:11434/api/tags
|
||||
```
|
||||
|
||||
### Cross-Node Communication Fails
|
||||
|
||||
1. Check firewall rules allow Swarm ports (2377, 7946, 4789)
|
||||
2. Check overlay network is encrypted: `docker network inspect llm-internal`
|
||||
3. Run connectivity verification: `bash docker/scripts/verify-connectivity.sh`
|
||||
|
||||
### Removing the Stack
|
||||
|
||||
```bash
|
||||
# Remove all services
|
||||
docker stack rm llm
|
||||
|
||||
# Leave swarm (on worker)
|
||||
bash docker/scripts/swarm-init.sh leave
|
||||
|
||||
# Leave swarm (on manager, destroys swarm)
|
||||
bash docker/scripts/swarm-init.sh leave
|
||||
```
|
||||
|
||||
## Single-Machine Fallback
|
||||
|
||||
To run on a single machine without Swarm (development/testing):
|
||||
|
||||
```bash
|
||||
docker compose -f docker/docker-compose.yml build
|
||||
docker compose -f docker/docker-compose.yml up -d
|
||||
bash docker/scripts/verify-connectivity.sh
|
||||
```
|
||||
|
||||
This uses the bridge network instead of overlay and does not require
|
||||
swarm initialization or node labeling.
|
||||
@@ -102,6 +102,7 @@
|
||||
| #96 | Define service placement constraints | Phase 12 | `COMPLETED` | Shell / Markdown | [issue-096.md](issue-096.md) |
|
||||
| #97 | Convert docker-compose.yml to Swarm stack | Phase 12 | `COMPLETED` | Docker / YAML | [issue-097.md](issue-097.md) |
|
||||
| #98 | Validate zero service code changes | Phase 12 | `COMPLETED` | Shell | [issue-098.md](issue-098.md) |
|
||||
| #99 | Document multi-machine deployment guide | Phase 12 | `COMPLETED` | Markdown | [issue-099.md](issue-099.md) |
|
||||
|
||||
## Status Legend
|
||||
|
||||
|
||||
38
implementation-plans/issue-099.md
Normal file
38
implementation-plans/issue-099.md
Normal file
@@ -0,0 +1,38 @@
|
||||
# Issue #99: Document multi-machine deployment guide
|
||||
|
||||
## Metadata
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Issue | #99 |
|
||||
| Title | Document multi-machine deployment guide |
|
||||
| Milestone | Phase 12: Multi-Machine Extension |
|
||||
| Status | `COMPLETED` |
|
||||
| Language | Markdown |
|
||||
| Related Plans | issue-095.md, issue-096.md, issue-097.md, issue-098.md |
|
||||
| Blocked by | #98 |
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [x] Prerequisites: hardware requirements, OS setup, Docker installation
|
||||
- [x] Swarm initialization walkthrough
|
||||
- [x] Node labeling guide
|
||||
- [x] Stack deployment step-by-step
|
||||
- [x] Monitoring and log aggregation setup
|
||||
- [x] Troubleshooting common issues
|
||||
- [x] Network topology diagram
|
||||
- [x] Security considerations (encrypted overlay, D-Bus exposure, secrets)
|
||||
|
||||
## Files Created/Modified
|
||||
|
||||
| File | Action | Purpose |
|
||||
|---|---|---|
|
||||
| `docker/DEPLOYMENT.md` | Create | Comprehensive multi-machine deployment guide |
|
||||
| `implementation-plans/issue-099.md` | Create | Plan |
|
||||
| `implementation-plans/_index.md` | Modify | Index entry |
|
||||
|
||||
## Deviation Log
|
||||
|
||||
| Deviation | Reason |
|
||||
|---|---|
|
||||
| None | — |
|
||||
Reference in New Issue
Block a user