- audit.proto: AuditService with Append RPC, AuditEntry, AuditAction enum - secrets.proto: SecretsService with GetSecret RPC - memory.proto: MemoryService with QueryMemory (streaming), WriteMemory, GetCorrelated - model_gateway.proto: ModelGatewayService with StreamInference, Inference, GenerateEmbedding, IsModelReady - search.proto: SearchService with Search RPC, SearchResult - tool_broker.proto: ToolBrokerService with DiscoverTools, ExecuteTool (streaming), ValidateCall - orchestrator.proto: OrchestratorService with ProcessRequest (streaming) All protos pass buf lint and buf build. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1.6 KiB
1.6 KiB
Implementation Plan — Issue #12: Define model_gateway.proto
Metadata
| Field | Value |
|---|---|
| Issue | #12 |
| Title | Define model_gateway.proto |
| Milestone | Phase 1: Proto Definitions |
| Labels | type:feature, priority:critical, lang:protobuf, service:model-gateway |
| Status | COMPLETED |
| Language | Protobuf |
| Related Plans | issue-008.md |
| Blocked by | #8 (completed) |
Acceptance Criteria
- ModelGatewayService with StreamInference, Inference, GenerateEmbedding, IsModelReady RPCs
- InferenceParams with task complexity hint for model routing
- Embedding request/response types
- Proto compiles without errors
Architecture Analysis
Wraps the Ollama HTTP API, exposing inference via gRPC. TaskComplexity enum drives model routing: simple tasks route to smaller models (3B/7B), complex tasks route to larger models (14B) for reasoning and code generation. InferenceParams is a shared message used by both StreamInference (server-streaming, token-by-token) and Inference (unary, full-text) RPCs. GenerateEmbedding targets nomic-embed-text by default and returns raw embedding vectors with dimension count. IsModelReady checks model availability with optional model name filter.
Files to Create/Modify
| File | Action | Purpose |
|---|---|---|
proto/llm_multiverse/v1/model_gateway.proto |
Modify | Define ModelGatewayService, TaskComplexity enum, InferenceParams, and all request/response types |
Deviation Log
(No deviations)