Files

Pi Agent dcd2752fde docs: mark issue #40 as COMPLETED in implementation plans

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-10 14:05:38 +01:00

11 KiB

Raw Blame History

Implementation Plan — Issue #40: Implement model routing logic

Metadata

Field	Value
Issue	#40
Title	Implement model routing logic
Milestone	Phase 5: Model Gateway
Labels	—
Status	`COMPLETED`
Language	Rust / Protobuf
Related Plans	issue-038.md, issue-039.md
Blocked by	#39, #20

Acceptance Criteria

Routing table: task type → model name (e.g., "code" → qwen2.5-coder:14b)
Model hint override from request
Default model fallback
Configuration-driven routing (not hardcoded)
Audit logging of every inference request via Audit Service

Architecture Analysis

Service Context

Belongs to Model Gateway service (services/model-gateway/).
Affects the Inference, StreamInference, and GenerateEmbedding gRPC endpoints (currently stubbed as Unimplemented). This issue creates the routing logic that those endpoints will call; the endpoint implementations themselves are in #41/#42.
Proto messages involved: InferenceParams (has TaskComplexity), InferenceRequest, StreamInferenceRequest, GenerateEmbeddingRequest (has optional model field), AppendRequest/AuditEntry for audit logging.

Existing Patterns

Config: services/model-gateway/src/config.rs — ModelRoutingConfig already has default_model, simple_model, complex_model, embedding_model, and aliases: HashMap<String, String>. No new config fields are needed.
Proto: TaskComplexity enum has UNSPECIFIED, SIMPLE, COMPLEX. The InferenceParams message carries task_complexity but currently has no explicit model hint/override field.
Audit pattern: services/memory/src/service.rs — AuditServiceClient<Channel> wrapped in Arc<Mutex<...>>, attached via with_audit_client() builder method, best-effort logging with tracing::warn! on failure, uses AUDIT_ACTION_MEMORY_WRITE (int value 4). Model Gateway will use AUDIT_ACTION_INFERENCE_REQUEST (int value 7).
Service struct: ModelGatewayServiceImpl in service.rs uses constructor new(config) pattern. Builder methods (e.g., with_audit_client) will follow the memory service convention.

Dependencies

Audit Service: gRPC client (audit_service_client::AuditServiceClient) from llm-multiverse-proto crate — already available via tonic dependency in Cargo.toml.
sha2 crate: For params_hash field in AuditEntry, following the memory service pattern.
Proto change: Add optional string model_hint field to InferenceParams in model_gateway.proto so callers can explicitly request a model by name or alias. The GenerateEmbeddingRequest already has an optional string model field for this purpose.

Implementation Steps

1. Proto Update — Add model_hint to InferenceParams

Add an optional string model_hint field to the InferenceParams message in proto/llm_multiverse/v1/model_gateway.proto:

message InferenceParams {
  SessionContext context = 1;
  string prompt = 2;
  TaskComplexity task_complexity = 3;
  uint32 max_tokens = 4;
  optional float temperature = 5;
  optional float top_p = 6;
  repeated string stop_sequences = 7;
  // Explicit model name or alias override. If set, bypasses task_complexity routing.
  optional string model_hint = 8;
}

After editing the proto, regenerate Rust stubs (run the existing buf generate / build.rs flow).

2. Core Logic — `ModelRouter` in `routing.rs`

Create services/model-gateway/src/routing.rs with a ModelRouter struct.

ModelRouter struct:

pub struct ModelRouter {
    config: ModelRoutingConfig,
}

Takes an owned or cloned ModelRoutingConfig.

resolve_model() method — the primary routing entry point:

pub fn resolve_model(
    &self,
    task_complexity: i32,       // proto enum as i32
    model_hint: Option<&str>,   // from InferenceParams.model_hint
) -> String

Resolution order:

If model_hint is Some(hint) and non-empty → use hint as the candidate name.
Else, map task_complexity to a config field:
- TASK_COMPLEXITY_SIMPLE (1) → self.config.simple_model
- TASK_COMPLEXITY_COMPLEX (2) → self.config.complex_model
- TASK_COMPLEXITY_UNSPECIFIED (0) or any other value → self.config.default_model
Apply alias expansion: if the candidate name exists as a key in self.config.aliases, replace it with the alias value.
Return the resolved model name.

resolve_embedding_model() method:

pub fn resolve_embedding_model(
    &self,
    model_override: Option<&str>,
) -> String

Resolution order:

If model_override is Some(name) and non-empty → use name as candidate.
Else → self.config.embedding_model.
Apply alias expansion (same as above).
Return resolved name.

resolve_alias() private helper:

fn resolve_alias(&self, name: &str) -> String

Looks up name in self.config.aliases; returns the mapped value if found, otherwise returns name unchanged. Single-level resolution only (no recursive alias chains) to avoid cycles.

3. Audit Logging — `audit_log_inference()` helper

Add an audit_log_inference() free function in service.rs (or a separate audit.rs module), following the exact pattern from the memory service.

Function signature:

async fn audit_log_inference(
    audit_client: &Arc<Mutex<AuditServiceClient<Channel>>>,
    ctx: &SessionContext,
    model_name: &str,
    prompt_length: usize,
    task_complexity: i32,
    rpc_name: &str,        // "Inference", "StreamInference", or "GenerateEmbedding"
    result_status: &str,   // "success" or "failure"
)

AuditEntry construction:

action: 7 (= AUDIT_ACTION_INFERENCE_REQUEST)
tool_name: the rpc_name parameter
params_hash: SHA-256 of "{rpc_name}:{model_name}:{prompt_length}:{task_complexity}"
result_status: passed through
metadata: include {"model": model_name, "prompt_length": prompt_length.to_string(), "task_complexity": task_complexity.to_string()}
session_id, agent_id: extracted from SessionContext (same pattern as memory service)

Best-effort semantics: wrap client.append() in if let Err(e) with tracing::warn! — never fail the inference request due to audit failure.

4. Service Integration — Wire into `ModelGatewayServiceImpl`

Add fields to ModelGatewayServiceImpl:

pub struct ModelGatewayServiceImpl {
    config: Config,
    ollama: OllamaClient,
    router: ModelRouter,
    audit_client: Option<Arc<Mutex<AuditServiceClient<Channel>>>>,
}

Update new() constructor:

pub fn new(config: Config) -> Result<Self, anyhow::Error> {
    let ollama = OllamaClient::new(&config)?;
    let router = ModelRouter::new(config.routing.clone());
    Ok(Self { config, ollama, router, audit_client: None })
}

Add builder method:

pub fn with_audit_client(mut self, client: AuditServiceClient<Channel>) -> Self {
    self.audit_client = Some(Arc::new(Mutex::new(client)));
    self
}

Update main.rs: If config.audit_addr is Some(addr), connect the AuditServiceClient and attach it via with_audit_client(). Use the same pattern as the memory service's main.

Add sha2 dependency to services/model-gateway/Cargo.toml:

sha2 = "0.10"

Update lib.rs to expose the new module:

pub mod routing;

5. Tests

Unit tests in `routing.rs` (`#[cfg(test)] mod tests`)

Test	Description
`test_resolve_simple_task`	`TaskComplexity::SIMPLE` (1) with no hint → `simple_model`
`test_resolve_complex_task`	`TaskComplexity::COMPLEX` (2) with no hint → `complex_model`
`test_resolve_unspecified_task`	`TaskComplexity::UNSPECIFIED` (0) with no hint → `default_model`
`test_resolve_unknown_value`	Unknown integer (99) with no hint → `default_model`
`test_model_hint_overrides_complexity`	`model_hint = Some("custom:7b")` ignores `task_complexity`
`test_model_hint_empty_string_ignored`	`model_hint = Some("")` falls through to complexity routing
`test_alias_expansion`	Alias "code" → "codellama:7b"; resolve with hint "code" → "codellama:7b"
`test_alias_expansion_via_complexity`	`simple_model = "fast"`, alias "fast" → "phi3:mini"; resolve SIMPLE → "phi3:mini"
`test_no_alias_passthrough`	Model name not in aliases → returned unchanged
`test_resolve_embedding_default`	No override → `embedding_model`
`test_resolve_embedding_override`	Override "custom-embed" → "custom-embed" (with alias check)
`test_resolve_embedding_alias`	Override matches alias key → expanded

Unit tests in `service.rs` (extend existing test module)

Test	Description
`test_service_has_router`	Verify `ModelGatewayServiceImpl::new()` initializes `router` field
`test_service_works_without_audit_client`	Confirm `audit_client` is `None` by default, existing endpoints still work
`test_with_audit_client_builder`	Verify `with_audit_client()` sets the field to `Some(...)`

No integration tests are needed in this issue — those belong to #41/#42 which implement the actual gRPC endpoints that call the router and audit functions.

Files to Create/Modify

File	Action	Purpose
`proto/llm_multiverse/v1/model_gateway.proto`	Modify	Add `optional string model_hint = 8` to `InferenceParams`
`services/model-gateway/src/routing.rs`	Create	`ModelRouter` struct with `resolve_model()`, `resolve_embedding_model()`, alias expansion
`services/model-gateway/src/lib.rs`	Modify	Add `pub mod routing;`
`services/model-gateway/src/service.rs`	Modify	Add `router: ModelRouter` and `audit_client` fields, `with_audit_client()` builder, `audit_log_inference()` helper
`services/model-gateway/src/main.rs`	Modify	Connect `AuditServiceClient` when `audit_addr` is configured, attach via builder
`services/model-gateway/Cargo.toml`	Modify	Add `sha2 = "0.10"` dependency

Risks and Edge Cases

Proto regeneration: Adding field 8 to InferenceParams is backward compatible (optional field, no renumbering). The Rust stubs must be regenerated before the routing code compiles. If the build.rs / buf generate step is not run, compilation will fail with a missing field error.
Alias cycles: If aliases contain A → B and B → A, single-level resolution prevents infinite loops. This is the intended design — only one alias lookup is performed.
Empty model_hint string: An empty string "" from proto is a valid Some("") in Rust for optional string fields. The router must treat empty strings the same as None (fall through to complexity routing).
Audit client unavailability: If the Audit Service is down, the tracing::warn! will fire but inference requests will proceed normally. This matches the memory service behavior.
Concurrent audit writes: Arc<Mutex<AuditServiceClient>> serializes audit calls per service instance. Under high load, audit logging could become a bottleneck. This is acceptable for the current single-instance design and consistent with the memory service pattern.

Deviation Log

(Filled during implementation if deviations from plan occur)

Deviation	Reason

11 KiB Raw Blame History