Files
llm-multiverse/implementation-plans/issue-029.md

22 KiB

Implementation Plan — Issue #29: Implement embedding generation integration

Metadata

Field Value
Issue #29
Title Implement embedding generation integration
Milestone Phase 4: Memory Service
Labels service:memory, lang:rust
Status COMPLETED
Language Rust
Related Plans issue-011.md, issue-012.md, issue-027.md, issue-028.md
Blocked by #28 (completed)

Acceptance Criteria

  • gRPC client to Model Gateway GenerateEmbedding
  • Embedding generated for each new memory entry
  • Embedding stored alongside memory in DuckDB
  • Batch embedding support for bulk imports
  • Graceful handling of Model Gateway unavailability

Architecture Analysis

Service Context

This issue belongs to the Memory Service (Rust). It integrates the Memory Service with the Model Gateway by implementing a gRPC client that calls GenerateEmbedding to produce 768-dimensional vectors (nomic-embed-text) for memory content at write time.

The Model Gateway proto (proto/llm_multiverse/v1/model_gateway.proto) defines:

  • GenerateEmbedding(GenerateEmbeddingRequest) returns (GenerateEmbeddingResponse) — unary RPC
  • GenerateEmbeddingRequest contains SessionContext context, string text, and optional string model
  • GenerateEmbeddingResponse contains repeated float embedding and uint32 dimensions

The Memory proto (proto/llm_multiverse/v1/memory.proto) defines MemoryEntry with embedding fields: bytes name_embedding, bytes description_embedding, bytes corpus_embedding. Per the architecture doc, each memory entry requires three embeddings (name, description, corpus) generated via nomic-embed-text.

The DuckDB schema (from issue #28) stores embeddings in the embeddings table with columns (memory_id, embedding_type, vector FLOAT[768]) where embedding_type is one of 'name', 'description', 'corpus'.

Existing Patterns

  • gRPC client pattern: The secrets service uses AuditServiceClient<Channel> wrapped in Arc<Mutex<>> for calling the audit service (see services/secrets/src/service.rs:16-17). The client is optional and configured via a builder method with_audit_client(). This same pattern applies here — the Model Gateway client should be optional to allow the memory service to start without the gateway (degraded mode).
  • Config: services/memory/src/config.rs already has embedding_endpoint: Option<String> for the Model Gateway address (line 14).
  • DuckDB access: DuckDbManager wraps the connection in Mutex<Connection> and exposes with_connection() (see services/memory/src/db/mod.rs:84-90).
  • Error types: DbError enum in services/memory/src/db/mod.rs:14-39 uses thiserror. A new error variant or a separate embedding error type is needed.

Dependencies

  • Crate: tonic — already a dependency, provides transport::Channel and transport::Endpoint for the gRPC client.
  • Proto-gen crate — already provides model_gateway_service_client::ModelGatewayServiceClient (client stubs are generated via build_client(true) in gen/rust/build.rs:16).
  • No new crate dependencies — all necessary types are already available via existing dependencies (tonic, prost, llm-multiverse-proto).

Cross-Service Integration

  • Model Gateway (GenerateEmbedding RPC) — the memory service becomes a gRPC client of the model gateway. The gateway must be running and reachable at the configured embedding_endpoint address.
  • DuckDB — embeddings are stored in the existing embeddings table after generation.

Implementation Steps

1. Types & Configuration

Define embedding-specific error types in a new services/memory/src/embedding/mod.rs:

#[derive(Debug, thiserror::Error)]
pub enum EmbeddingError {
    /// The Model Gateway is unavailable or unreachable.
    #[error("Model Gateway unavailable: {0}")]
    GatewayUnavailable(String),

    /// The gRPC call to GenerateEmbedding failed.
    #[error("embedding generation failed: {0}")]
    GenerationFailed(#[from] tonic::Status),

    /// The returned embedding has an unexpected dimension.
    #[error("dimension mismatch: expected {expected}, got {actual}")]
    DimensionMismatch { expected: usize, actual: usize },

    /// Connection to Model Gateway failed.
    #[error("connection error: {0}")]
    ConnectionError(#[from] tonic::transport::Error),
}

Define the embedding field enum for type safety:

/// Identifies which field of a memory entry an embedding belongs to.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum EmbeddingField {
    Name,
    Description,
    Corpus,
}

impl EmbeddingField {
    /// Returns the DuckDB `embedding_type` column value.
    pub fn as_db_type(&self) -> &'static str {
        match self {
            Self::Name => "name",
            Self::Description => "description",
            Self::Corpus => "corpus",
        }
    }
}

Define a batch embedding request struct:

/// A request to generate an embedding for a specific text and field.
pub struct EmbeddingRequest {
    pub memory_id: String,
    pub field: EmbeddingField,
    pub text: String,
}

/// A completed embedding ready for storage.
pub struct EmbeddingResult {
    pub memory_id: String,
    pub field: EmbeddingField,
    pub vector: Vec<f32>,
}

No config changes neededConfig::embedding_endpoint already exists in services/memory/src/config.rs:14.

2. Core Logic

Create services/memory/src/embedding/mod.rs — Embedding client:

The core struct wraps the ModelGatewayServiceClient:

use llm_multiverse_proto::llm_multiverse::v1::{
    model_gateway_service_client::ModelGatewayServiceClient,
    GenerateEmbeddingRequest, SessionContext,
};
use tonic::transport::Channel;
use crate::db::schema::EMBEDDING_DIM;

pub struct EmbeddingClient {
    client: ModelGatewayServiceClient<Channel>,
}

impl EmbeddingClient {
    /// Connect to the Model Gateway at the given endpoint.
    pub async fn connect(endpoint: &str) -> Result<Self, EmbeddingError>;

    /// Generate an embedding for a single text string.
    ///
    /// Calls `GenerateEmbedding` on the Model Gateway with `model` set to None
    /// (defaults to nomic-embed-text on the gateway side).
    /// Validates that the returned vector has `EMBEDDING_DIM` dimensions.
    pub async fn generate(
        &mut self,
        context: &SessionContext,
        text: &str,
    ) -> Result<Vec<f32>, EmbeddingError>;

    /// Generate embeddings for all three fields of a memory entry.
    ///
    /// Returns a Vec of 3 `EmbeddingResult`s (name, description, corpus).
    /// Calls are made sequentially to avoid overloading the gateway.
    /// If a field text is empty, a zero vector is stored.
    pub async fn generate_for_entry(
        &mut self,
        context: &SessionContext,
        memory_id: &str,
        name: &str,
        description: &str,
        corpus: &str,
    ) -> Result<Vec<EmbeddingResult>, EmbeddingError>;

    /// Generate embeddings for a batch of entries (bulk import support).
    ///
    /// Processes entries sequentially to respect gateway capacity.
    /// Returns partial results on failure — successfully generated embeddings
    /// are returned alongside the error for the failed entry.
    pub async fn generate_batch(
        &mut self,
        context: &SessionContext,
        requests: Vec<EmbeddingRequest>,
    ) -> Result<Vec<EmbeddingResult>, BatchEmbeddingError>;
}

Batch error type:

/// Error from batch embedding generation.
#[derive(Debug, thiserror::Error)]
#[error("batch embedding failed at index {failed_index}: {source}")]
pub struct BatchEmbeddingError {
    /// Embeddings that were successfully generated before the failure.
    pub completed: Vec<EmbeddingResult>,
    /// Index of the request that failed.
    pub failed_index: usize,
    /// The underlying error.
    #[source]
    pub source: EmbeddingError,
}

Key implementation details for generate():

  1. Build GenerateEmbeddingRequest with context, text, and model: None (gateway defaults to nomic-embed-text).
  2. Call self.client.generate_embedding(request).await.
  3. Extract embedding vector from GenerateEmbeddingResponse.
  4. Validate embedding.len() == EMBEDDING_DIM. Return EmbeddingError::DimensionMismatch if not.
  5. Return the Vec<f32>.

Key implementation details for generate_for_entry():

  1. For each field (name, description, corpus):
    • If text is empty, produce a zero vector vec![0.0f32; EMBEDDING_DIM] (skip the gateway call).
    • Otherwise, call self.generate(context, text).await.
  2. Collect results as Vec<EmbeddingResult>.

Key implementation details for generate_batch():

  1. Iterate over requests sequentially.
  2. Call self.generate(context, &request.text).await for each.
  3. On success, push to completed Vec.
  4. On failure, return BatchEmbeddingError with the completed Vec and the failure details.

Create services/memory/src/embedding/store.rs — Embedding storage helper:

use crate::db::{DbError, DuckDbManager};
use super::{EmbeddingResult, EmbeddingField};
use crate::db::schema::EMBEDDING_DIM;

/// Store a single embedding result in the DuckDB `embeddings` table.
///
/// Uses INSERT OR REPLACE to support updating existing embeddings.
pub fn store_embedding(
    db: &DuckDbManager,
    result: &EmbeddingResult,
) -> Result<(), DbError>;

/// Store multiple embedding results in a single transaction.
///
/// Used after `generate_for_entry()` or `generate_batch()`.
pub fn store_embeddings(
    db: &DuckDbManager,
    results: &[EmbeddingResult],
) -> Result<(), DbError>;

Key implementation details for store_embedding():

  1. Call db.with_connection() to acquire the lock.
  2. Format the Vec<f32> as a DuckDB array literal [v0, v1, ...]::FLOAT[768].
  3. Execute: INSERT OR REPLACE INTO embeddings (memory_id, embedding_type, vector) VALUES (?, ?, <array_literal>::FLOAT[768]).
  4. After insertion, call schema::ensure_hnsw_index(conn) to create the HNSW index if it was deferred.

Key implementation details for store_embeddings():

  1. Call db.with_connection() once, execute all inserts within the same lock scope.
  2. Use a DuckDB transaction (BEGIN; ... COMMIT;) for atomicity.
  3. Call ensure_hnsw_index(conn) once after all inserts.

3. gRPC Handler Wiring

Update services/memory/src/service.rs — Add embedding client to MemoryServiceImpl:

use crate::embedding::EmbeddingClient;
use tokio::sync::Mutex;

pub struct MemoryServiceImpl {
    db: Arc<DuckDbManager>,
    embedding_client: Option<Arc<Mutex<EmbeddingClient>>>,
}

impl MemoryServiceImpl {
    pub fn new(db: Arc<DuckDbManager>) -> Self {
        Self {
            db,
            embedding_client: None,
        }
    }

    /// Attach an embedding client for generating embeddings on write.
    pub fn with_embedding_client(mut self, client: EmbeddingClient) -> Self {
        self.embedding_client = Some(Arc::new(Mutex::new(client)));
        self
    }
}

This follows the same builder pattern as SecretsServiceImpl::with_audit_client() in services/secrets/src/service.rs:27-33.

The write_memory() handler will remain Unimplemented for now — the full write path is a later issue. However, the embedding client is wired so that when write_memory() is implemented, it can:

  1. Acquire the embedding client lock.
  2. Call generate_for_entry() with the entry's name, description, and corpus.
  3. Call store_embeddings() to persist the vectors.
  4. Insert the memory row into the memories table.

4. Service Integration

Update services/memory/src/main.rs — Connect embedding client at startup:

use memory_service::embedding::EmbeddingClient;

// In main(), after config loading:
let mut memory_service = MemoryServiceImpl::new(db);

// Connect to Model Gateway if configured.
if let Some(ref endpoint) = config.embedding_endpoint {
    match EmbeddingClient::connect(endpoint).await {
        Ok(client) => {
            tracing::info!(endpoint = %endpoint, "Connected to Model Gateway for embeddings");
            memory_service = memory_service.with_embedding_client(client);
        }
        Err(e) => {
            tracing::warn!(
                endpoint = %endpoint,
                error = %e,
                "Model Gateway unavailable — starting without embedding support"
            );
        }
    }
}

Graceful degradation: If the Model Gateway is unreachable at startup, the memory service starts without embedding support. Write operations that require embeddings should return a clear error (e.g., Status::failed_precondition("embedding client not configured")). This matches the architecture principle that services should start independently.

Reconnection strategy: The initial implementation uses a single connection established at startup. If the gateway becomes unavailable after startup, generate() calls will fail with tonic::Status errors which are propagated as EmbeddingError::GenerationFailed. A reconnection mechanism can be added in a future issue if needed.

5. Tests

Unit tests in services/memory/src/embedding/mod.rs:

Test Case Description
test_embedding_field_as_db_type Verify EmbeddingField::Name.as_db_type() == "name", etc.
test_embedding_field_variants All three variants exist and are distinct
test_dimension_mismatch_error Construct DimensionMismatch error, verify message contains expected/actual
test_batch_error_preserves_completed BatchEmbeddingError retains successfully completed results

Unit tests in services/memory/src/embedding/store.rs:

Test Case Description
test_store_single_embedding Store one embedding, read it back via SQL, verify dimensions and values
test_store_embeddings_batch Store 3 embeddings (name, desc, corpus for one entry), verify all stored
test_store_embedding_overwrites Store, then store again with different vector, verify updated
test_store_ensures_hnsw_index After storing, verify HNSW index exists via duckdb_indexes()
test_store_empty_vector Store a zero vector, verify it can be stored and retrieved

Integration tests in services/memory/src/service.rs (update existing tests):

Test Case Description
test_service_starts_without_embedding_client MemoryServiceImpl::new(db) works without embedding client (existing behavior preserved)
test_service_with_embedding_client MemoryServiceImpl::new(db).with_embedding_client(client) stores the client

Note on testing the gRPC client: The EmbeddingClient::generate() method calls a remote Model Gateway. For unit tests, we define an EmbeddingGenerator trait:

/// Trait for embedding generation, enabling mock implementations in tests.
#[tonic::async_trait]
pub trait EmbeddingGenerator: Send + Sync {
    async fn generate(
        &self,
        context: &SessionContext,
        text: &str,
    ) -> Result<Vec<f32>, EmbeddingError>;
}

The real EmbeddingClient implements this trait. Tests use a MockEmbeddingGenerator that returns predetermined vectors:

#[cfg(test)]
pub mod mock {
    use super::*;
    use crate::db::schema::EMBEDDING_DIM;

    pub struct MockEmbeddingGenerator {
        /// If set, all calls return this error.
        pub fail_with: Option<EmbeddingError>,
    }

    #[tonic::async_trait]
    impl EmbeddingGenerator for MockEmbeddingGenerator {
        async fn generate(
            &self,
            _context: &SessionContext,
            text: &str,
        ) -> Result<Vec<f32>, EmbeddingError> {
            if let Some(ref err) = self.fail_with {
                // Clone the error description for a new error
                return Err(EmbeddingError::GatewayUnavailable(err.to_string()));
            }
            // Return a deterministic vector based on text hash
            let hash = text.len() as f32 / 100.0;
            Ok(vec![hash; EMBEDDING_DIM])
        }
    }
}

Additional mock-based tests:

Test Case Description
test_generate_for_entry_all_fields Mock generator returns vectors for name/desc/corpus
test_generate_for_entry_empty_fields Empty text produces zero vectors without calling generator
test_generate_batch_success All requests succeed
test_generate_batch_partial_failure Failure at index 2 returns first 2 completed results
test_gateway_unavailable_error Mock returns GatewayUnavailable, verify error type propagation

Files to Create/Modify

File Action Purpose
services/memory/src/lib.rs Modify Add pub mod embedding;
services/memory/src/embedding/mod.rs Create EmbeddingClient, EmbeddingGenerator trait, EmbeddingError, EmbeddingField, EmbeddingRequest, EmbeddingResult, BatchEmbeddingError, mock module
services/memory/src/embedding/store.rs Create store_embedding(), store_embeddings() — DuckDB storage helpers
services/memory/src/service.rs Modify Add embedding_client field to MemoryServiceImpl, add with_embedding_client() builder
services/memory/src/main.rs Modify Connect to Model Gateway at startup if embedding_endpoint is configured

Risks and Edge Cases

  • Model Gateway not yet implemented: The Model Gateway service does not exist yet (issue #12 defined the proto, but the service itself is a later milestone). During development, the embedding client can only be tested with mocks. The real integration will be validated when the gateway is built. The memory service must start cleanly without it.
  • Embedding dimension drift: If the embedding model changes from nomic-embed-text (768 dims) to another model, the EMBEDDING_DIM constant and the DuckDB schema (FLOAT[768]) must both be updated via a migration. The dimension validation in generate() will catch mismatches at runtime.
  • Batch size limits: For very large bulk imports, sending hundreds of sequential embedding requests to the gateway may be slow. The initial implementation is sequential. Future optimization: add configurable concurrency (tokio::sync::Semaphore to limit parallel gateway calls to e.g., 4).
  • Empty text fields: A memory entry might have an empty description or corpus. Calling the embedding model with empty text is wasteful and may produce meaningless vectors. The plan handles this by producing zero vectors for empty fields without calling the gateway.
  • Gateway connection lifecycle: The tonic::transport::Channel handles connection pooling and reconnection internally. However, if the gateway is down for an extended period, gRPC calls will fail with transport errors. The current plan propagates these as EmbeddingError::GenerationFailed. Callers (the write handler) should map this to an appropriate tonic::Status (e.g., Status::unavailable).
  • Transaction atomicity: When writing a memory entry, the memory row and its embeddings should be written atomically. The store_embeddings() function uses a DuckDB transaction. However, the memory row insertion (in a future issue) must be coordinated with embedding storage in the same transaction. The with_connection() lock ensures exclusive access, but the caller must orchestrate the full write sequence.
  • Thread safety of EmbeddingClient: ModelGatewayServiceClient<Channel> is Clone and internally reference-counted, so wrapping in Arc<Mutex<>> is safe but potentially overly conservative. A Clone-based approach could avoid the lock. However, following the established pattern from the secrets service (Arc<Mutex<AuditServiceClient<Channel>>>) maintains consistency.

Deviation Log

Deviation Reason
Added mock server tests inline in mod tests rather than as separate integration tests Keeps tests co-located with the code they cover; follows same pattern as existing unit tests in the module

Retry Instructions

Failure Summary (Attempt 1)

Quality Gates:

  • Build: PASS
  • Lint (clippy): PASS
  • Tests: PASS (43 memory-service tests, all green)
  • Coverage: FAIL — embedding/mod.rs at 78.2% (52 uncovered lines in EmbeddingClient impl)

Root Cause: The EmbeddingClient struct has concrete gRPC methods (connect, generate, generate_for_entry, generate_batch) that cannot be unit-tested without a live Model Gateway server. The EmbeddingGenerator trait and MockEmbeddingGenerator cover the trait-based paths but not the concrete client code.

Required Fixes

  1. Add a tonic mock server test in services/memory/src/embedding/mod.rs:

    • Use tonic's built-in test infrastructure to create a mock ModelGatewayService server that implements the GenerateEmbedding RPC
    • The mock server should return a predetermined 768-dim vector
    • Write tests that create an EmbeddingClient connected to this mock server and exercise:
      • generate() — single text to embedding
      • generate_for_entry() — name/desc/corpus fields
      • generate_batch() — multiple requests
      • Error case: mock server returns an error status, verify EmbeddingError::GenerationFailed
      • Dimension mismatch: mock server returns wrong-dimension vector, verify EmbeddingError::DimensionMismatch
    • This should cover the remaining 52 lines in EmbeddingClient
  2. After fixing, run:

    • cargo test --workspace to verify all tests pass
    • cargo clippy --workspace -- -D warnings to verify no warnings
    • cargo llvm-cov --workspace --lcov --output-path lcov.info to verify coverage >= 95% on embedding/mod.rs