11 KiB
Implementation Plan — Issue #40: Implement model routing logic
Metadata
| Field | Value |
|---|---|
| Issue | #40 |
| Title | Implement model routing logic |
| Milestone | Phase 5: Model Gateway |
| Labels | — |
| Status | COMPLETED |
| Language | Rust / Protobuf |
| Related Plans | issue-038.md, issue-039.md |
| Blocked by | #39, #20 |
Acceptance Criteria
- Routing table: task type → model name (e.g., "code" → qwen2.5-coder:14b)
- Model hint override from request
- Default model fallback
- Configuration-driven routing (not hardcoded)
- Audit logging of every inference request via Audit Service
Architecture Analysis
Service Context
- Belongs to Model Gateway service (
services/model-gateway/). - Affects the
Inference,StreamInference, andGenerateEmbeddinggRPC endpoints (currently stubbed asUnimplemented). This issue creates the routing logic that those endpoints will call; the endpoint implementations themselves are in #41/#42. - Proto messages involved:
InferenceParams(hasTaskComplexity),InferenceRequest,StreamInferenceRequest,GenerateEmbeddingRequest(has optionalmodelfield),AppendRequest/AuditEntryfor audit logging.
Existing Patterns
- Config:
services/model-gateway/src/config.rs—ModelRoutingConfigalready hasdefault_model,simple_model,complex_model,embedding_model, andaliases: HashMap<String, String>. No new config fields are needed. - Proto:
TaskComplexityenum hasUNSPECIFIED,SIMPLE,COMPLEX. TheInferenceParamsmessage carriestask_complexitybut currently has no explicit model hint/override field. - Audit pattern:
services/memory/src/service.rs—AuditServiceClient<Channel>wrapped inArc<Mutex<...>>, attached viawith_audit_client()builder method, best-effort logging withtracing::warn!on failure, usesAUDIT_ACTION_MEMORY_WRITE(int value 4). Model Gateway will useAUDIT_ACTION_INFERENCE_REQUEST(int value 7). - Service struct:
ModelGatewayServiceImplinservice.rsuses constructornew(config)pattern. Builder methods (e.g.,with_audit_client) will follow the memory service convention.
Dependencies
- Audit Service: gRPC client (
audit_service_client::AuditServiceClient) fromllm-multiverse-protocrate — already available viatonicdependency inCargo.toml. - sha2 crate: For
params_hashfield inAuditEntry, following the memory service pattern. - Proto change: Add
optional string model_hintfield toInferenceParamsinmodel_gateway.protoso callers can explicitly request a model by name or alias. TheGenerateEmbeddingRequestalready has anoptional string modelfield for this purpose.
Implementation Steps
1. Proto Update — Add model_hint to InferenceParams
Add an optional string model_hint field to the InferenceParams message in proto/llm_multiverse/v1/model_gateway.proto:
message InferenceParams {
SessionContext context = 1;
string prompt = 2;
TaskComplexity task_complexity = 3;
uint32 max_tokens = 4;
optional float temperature = 5;
optional float top_p = 6;
repeated string stop_sequences = 7;
// Explicit model name or alias override. If set, bypasses task_complexity routing.
optional string model_hint = 8;
}
After editing the proto, regenerate Rust stubs (run the existing buf generate / build.rs flow).
2. Core Logic — ModelRouter in routing.rs
Create services/model-gateway/src/routing.rs with a ModelRouter struct.
ModelRouter struct:
pub struct ModelRouter {
config: ModelRoutingConfig,
}
Takes an owned or cloned ModelRoutingConfig.
resolve_model() method — the primary routing entry point:
pub fn resolve_model(
&self,
task_complexity: i32, // proto enum as i32
model_hint: Option<&str>, // from InferenceParams.model_hint
) -> String
Resolution order:
- If
model_hintisSome(hint)and non-empty → usehintas the candidate name. - Else, map
task_complexityto a config field:TASK_COMPLEXITY_SIMPLE(1) →self.config.simple_modelTASK_COMPLEXITY_COMPLEX(2) →self.config.complex_modelTASK_COMPLEXITY_UNSPECIFIED(0) or any other value →self.config.default_model
- Apply alias expansion: if the candidate name exists as a key in
self.config.aliases, replace it with the alias value. - Return the resolved model name.
resolve_embedding_model() method:
pub fn resolve_embedding_model(
&self,
model_override: Option<&str>,
) -> String
Resolution order:
- If
model_overrideisSome(name)and non-empty → usenameas candidate. - Else →
self.config.embedding_model. - Apply alias expansion (same as above).
- Return resolved name.
resolve_alias() private helper:
fn resolve_alias(&self, name: &str) -> String
Looks up name in self.config.aliases; returns the mapped value if found, otherwise returns name unchanged. Single-level resolution only (no recursive alias chains) to avoid cycles.
3. Audit Logging — audit_log_inference() helper
Add an audit_log_inference() free function in service.rs (or a separate audit.rs module), following the exact pattern from the memory service.
Function signature:
async fn audit_log_inference(
audit_client: &Arc<Mutex<AuditServiceClient<Channel>>>,
ctx: &SessionContext,
model_name: &str,
prompt_length: usize,
task_complexity: i32,
rpc_name: &str, // "Inference", "StreamInference", or "GenerateEmbedding"
result_status: &str, // "success" or "failure"
)
AuditEntry construction:
action:7(=AUDIT_ACTION_INFERENCE_REQUEST)tool_name: therpc_nameparameterparams_hash: SHA-256 of"{rpc_name}:{model_name}:{prompt_length}:{task_complexity}"result_status: passed throughmetadata: include{"model": model_name, "prompt_length": prompt_length.to_string(), "task_complexity": task_complexity.to_string()}session_id,agent_id: extracted fromSessionContext(same pattern as memory service)
Best-effort semantics: wrap client.append() in if let Err(e) with tracing::warn! — never fail the inference request due to audit failure.
4. Service Integration — Wire into ModelGatewayServiceImpl
Add fields to ModelGatewayServiceImpl:
pub struct ModelGatewayServiceImpl {
config: Config,
ollama: OllamaClient,
router: ModelRouter,
audit_client: Option<Arc<Mutex<AuditServiceClient<Channel>>>>,
}
Update new() constructor:
pub fn new(config: Config) -> Result<Self, anyhow::Error> {
let ollama = OllamaClient::new(&config)?;
let router = ModelRouter::new(config.routing.clone());
Ok(Self { config, ollama, router, audit_client: None })
}
Add builder method:
pub fn with_audit_client(mut self, client: AuditServiceClient<Channel>) -> Self {
self.audit_client = Some(Arc::new(Mutex::new(client)));
self
}
Update main.rs: If config.audit_addr is Some(addr), connect the AuditServiceClient and attach it via with_audit_client(). Use the same pattern as the memory service's main.
Add sha2 dependency to services/model-gateway/Cargo.toml:
sha2 = "0.10"
Update lib.rs to expose the new module:
pub mod routing;
5. Tests
Unit tests in routing.rs (#[cfg(test)] mod tests)
| Test | Description |
|---|---|
test_resolve_simple_task |
TaskComplexity::SIMPLE (1) with no hint → simple_model |
test_resolve_complex_task |
TaskComplexity::COMPLEX (2) with no hint → complex_model |
test_resolve_unspecified_task |
TaskComplexity::UNSPECIFIED (0) with no hint → default_model |
test_resolve_unknown_value |
Unknown integer (99) with no hint → default_model |
test_model_hint_overrides_complexity |
model_hint = Some("custom:7b") ignores task_complexity |
test_model_hint_empty_string_ignored |
model_hint = Some("") falls through to complexity routing |
test_alias_expansion |
Alias "code" → "codellama:7b"; resolve with hint "code" → "codellama:7b" |
test_alias_expansion_via_complexity |
simple_model = "fast", alias "fast" → "phi3:mini"; resolve SIMPLE → "phi3:mini" |
test_no_alias_passthrough |
Model name not in aliases → returned unchanged |
test_resolve_embedding_default |
No override → embedding_model |
test_resolve_embedding_override |
Override "custom-embed" → "custom-embed" (with alias check) |
test_resolve_embedding_alias |
Override matches alias key → expanded |
Unit tests in service.rs (extend existing test module)
| Test | Description |
|---|---|
test_service_has_router |
Verify ModelGatewayServiceImpl::new() initializes router field |
test_service_works_without_audit_client |
Confirm audit_client is None by default, existing endpoints still work |
test_with_audit_client_builder |
Verify with_audit_client() sets the field to Some(...) |
No integration tests are needed in this issue — those belong to #41/#42 which implement the actual gRPC endpoints that call the router and audit functions.
Files to Create/Modify
| File | Action | Purpose |
|---|---|---|
proto/llm_multiverse/v1/model_gateway.proto |
Modify | Add optional string model_hint = 8 to InferenceParams |
services/model-gateway/src/routing.rs |
Create | ModelRouter struct with resolve_model(), resolve_embedding_model(), alias expansion |
services/model-gateway/src/lib.rs |
Modify | Add pub mod routing; |
services/model-gateway/src/service.rs |
Modify | Add router: ModelRouter and audit_client fields, with_audit_client() builder, audit_log_inference() helper |
services/model-gateway/src/main.rs |
Modify | Connect AuditServiceClient when audit_addr is configured, attach via builder |
services/model-gateway/Cargo.toml |
Modify | Add sha2 = "0.10" dependency |
Risks and Edge Cases
- Proto regeneration: Adding field 8 to
InferenceParamsis backward compatible (optional field, no renumbering). The Rust stubs must be regenerated before the routing code compiles. If thebuild.rs/buf generatestep is not run, compilation will fail with a missing field error. - Alias cycles: If aliases contain
A → BandB → A, single-level resolution prevents infinite loops. This is the intended design — only one alias lookup is performed. - Empty model_hint string: An empty string
""from proto is a validSome("")in Rust for optional string fields. The router must treat empty strings the same asNone(fall through to complexity routing). - Audit client unavailability: If the Audit Service is down, the
tracing::warn!will fire but inference requests will proceed normally. This matches the memory service behavior. - Concurrent audit writes:
Arc<Mutex<AuditServiceClient>>serializes audit calls per service instance. Under high load, audit logging could become a bottleneck. This is acceptable for the current single-instance design and consistent with the memory service pattern.
Deviation Log
(Filled during implementation if deviations from plan occur)
| Deviation | Reason |
|---|