Files
llm-multiverse/implementation-plans/issue-012.md
shahondin1624 1ee3da58f3 feat: define all service proto files (#9-#15)
- audit.proto: AuditService with Append RPC, AuditEntry, AuditAction enum
- secrets.proto: SecretsService with GetSecret RPC
- memory.proto: MemoryService with QueryMemory (streaming), WriteMemory, GetCorrelated
- model_gateway.proto: ModelGatewayService with StreamInference, Inference, GenerateEmbedding, IsModelReady
- search.proto: SearchService with Search RPC, SearchResult
- tool_broker.proto: ToolBrokerService with DiscoverTools, ExecuteTool (streaming), ValidateCall
- orchestrator.proto: OrchestratorService with ProcessRequest (streaming)

All protos pass buf lint and buf build.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 07:31:37 +01:00

1.6 KiB

Implementation Plan — Issue #12: Define model_gateway.proto

Metadata

Field Value
Issue #12
Title Define model_gateway.proto
Milestone Phase 1: Proto Definitions
Labels type:feature, priority:critical, lang:protobuf, service:model-gateway
Status COMPLETED
Language Protobuf
Related Plans issue-008.md
Blocked by #8 (completed)

Acceptance Criteria

  • ModelGatewayService with StreamInference, Inference, GenerateEmbedding, IsModelReady RPCs
  • InferenceParams with task complexity hint for model routing
  • Embedding request/response types
  • Proto compiles without errors

Architecture Analysis

Wraps the Ollama HTTP API, exposing inference via gRPC. TaskComplexity enum drives model routing: simple tasks route to smaller models (3B/7B), complex tasks route to larger models (14B) for reasoning and code generation. InferenceParams is a shared message used by both StreamInference (server-streaming, token-by-token) and Inference (unary, full-text) RPCs. GenerateEmbedding targets nomic-embed-text by default and returns raw embedding vectors with dimension count. IsModelReady checks model availability with optional model name filter.

Files to Create/Modify

File Action Purpose
proto/llm_multiverse/v1/model_gateway.proto Modify Define ModelGatewayService, TaskComplexity enum, InferenceParams, and all request/response types

Deviation Log

(No deviations)