Files

shahondin1624 64c7f3a9ae feat: implement embedding generation integration (issue #29 )

2026-03-09 19:35:14 +01:00

22 KiB

Raw Blame History

Implementation Plan — Issue #29: Implement embedding generation integration

Metadata

Field	Value
Issue	#29
Title	Implement embedding generation integration
Milestone	Phase 4: Memory Service
Labels	`service:memory`, `lang:rust`
Status	`COMPLETED`
Language	Rust
Related Plans	issue-011.md, issue-012.md, issue-027.md, issue-028.md
Blocked by	#28 (completed)

Acceptance Criteria

gRPC client to Model Gateway GenerateEmbedding
Embedding generated for each new memory entry
Embedding stored alongside memory in DuckDB
Batch embedding support for bulk imports
Graceful handling of Model Gateway unavailability

Architecture Analysis

Service Context

This issue belongs to the Memory Service (Rust). It integrates the Memory Service with the Model Gateway by implementing a gRPC client that calls GenerateEmbedding to produce 768-dimensional vectors (nomic-embed-text) for memory content at write time.

The Model Gateway proto (proto/llm_multiverse/v1/model_gateway.proto) defines:

GenerateEmbedding(GenerateEmbeddingRequest) returns (GenerateEmbeddingResponse) — unary RPC
GenerateEmbeddingRequest contains SessionContext context, string text, and optional string model
GenerateEmbeddingResponse contains repeated float embedding and uint32 dimensions

The Memory proto (proto/llm_multiverse/v1/memory.proto) defines MemoryEntry with embedding fields: bytes name_embedding, bytes description_embedding, bytes corpus_embedding. Per the architecture doc, each memory entry requires three embeddings (name, description, corpus) generated via nomic-embed-text.

The DuckDB schema (from issue #28) stores embeddings in the embeddings table with columns (memory_id, embedding_type, vector FLOAT[768]) where embedding_type is one of 'name', 'description', 'corpus'.

Existing Patterns

gRPC client pattern: The secrets service uses AuditServiceClient<Channel> wrapped in Arc<Mutex<>> for calling the audit service (see services/secrets/src/service.rs:16-17). The client is optional and configured via a builder method with_audit_client(). This same pattern applies here — the Model Gateway client should be optional to allow the memory service to start without the gateway (degraded mode).
Config: services/memory/src/config.rs already has embedding_endpoint: Option<String> for the Model Gateway address (line 14).
DuckDB access: DuckDbManager wraps the connection in Mutex<Connection> and exposes with_connection() (see services/memory/src/db/mod.rs:84-90).
Error types: DbError enum in services/memory/src/db/mod.rs:14-39 uses thiserror. A new error variant or a separate embedding error type is needed.

Dependencies

Crate: tonic — already a dependency, provides transport::Channel and transport::Endpoint for the gRPC client.
Proto-gen crate — already provides model_gateway_service_client::ModelGatewayServiceClient (client stubs are generated via build_client(true) in gen/rust/build.rs:16).
No new crate dependencies — all necessary types are already available via existing dependencies (tonic, prost, llm-multiverse-proto).

Cross-Service Integration

Model Gateway (GenerateEmbedding RPC) — the memory service becomes a gRPC client of the model gateway. The gateway must be running and reachable at the configured embedding_endpoint address.
DuckDB — embeddings are stored in the existing embeddings table after generation.

Implementation Steps

1. Types & Configuration

Define embedding-specific error types in a new services/memory/src/embedding/mod.rs:

#[derive(Debug, thiserror::Error)]
pub enum EmbeddingError {
    /// The Model Gateway is unavailable or unreachable.
    #[error("Model Gateway unavailable: {0}")]
    GatewayUnavailable(String),

    /// The gRPC call to GenerateEmbedding failed.
    #[error("embedding generation failed: {0}")]
    GenerationFailed(#[from] tonic::Status),

    /// The returned embedding has an unexpected dimension.
    #[error("dimension mismatch: expected {expected}, got {actual}")]
    DimensionMismatch { expected: usize, actual: usize },

    /// Connection to Model Gateway failed.
    #[error("connection error: {0}")]
    ConnectionError(#[from] tonic::transport::Error),
}

Define the embedding field enum for type safety:

/// Identifies which field of a memory entry an embedding belongs to.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum EmbeddingField {
    Name,
    Description,
    Corpus,
}

impl EmbeddingField {
    /// Returns the DuckDB `embedding_type` column value.
    pub fn as_db_type(&self) -> &'static str {
        match self {
            Self::Name => "name",
            Self::Description => "description",
            Self::Corpus => "corpus",
        }
    }
}

Define a batch embedding request struct:

/// A request to generate an embedding for a specific text and field.
pub struct EmbeddingRequest {
    pub memory_id: String,
    pub field: EmbeddingField,
    pub text: String,
}

/// A completed embedding ready for storage.
pub struct EmbeddingResult {
    pub memory_id: String,
    pub field: EmbeddingField,
    pub vector: Vec<f32>,
}

No config changes needed — Config::embedding_endpoint already exists in services/memory/src/config.rs:14.

2. Core Logic

Create services/memory/src/embedding/mod.rs — Embedding client:

The core struct wraps the ModelGatewayServiceClient:

use llm_multiverse_proto::llm_multiverse::v1::{
    model_gateway_service_client::ModelGatewayServiceClient,
    GenerateEmbeddingRequest, SessionContext,
};
use tonic::transport::Channel;
use crate::db::schema::EMBEDDING_DIM;

pub struct EmbeddingClient {
    client: ModelGatewayServiceClient<Channel>,
}

impl EmbeddingClient {
    /// Connect to the Model Gateway at the given endpoint.
    pub async fn connect(endpoint: &str) -> Result<Self, EmbeddingError>;

    /// Generate an embedding for a single text string.
    ///
    /// Calls `GenerateEmbedding` on the Model Gateway with `model` set to None
    /// (defaults to nomic-embed-text on the gateway side).
    /// Validates that the returned vector has `EMBEDDING_DIM` dimensions.
    pub async fn generate(
        &mut self,
        context: &SessionContext,
        text: &str,
    ) -> Result<Vec<f32>, EmbeddingError>;

    /// Generate embeddings for all three fields of a memory entry.
    ///
    /// Returns a Vec of 3 `EmbeddingResult`s (name, description, corpus).
    /// Calls are made sequentially to avoid overloading the gateway.
    /// If a field text is empty, a zero vector is stored.
    pub async fn generate_for_entry(
        &mut self,
        context: &SessionContext,
        memory_id: &str,
        name: &str,
        description: &str,
        corpus: &str,
    ) -> Result<Vec<EmbeddingResult>, EmbeddingError>;

    /// Generate embeddings for a batch of entries (bulk import support).
    ///
    /// Processes entries sequentially to respect gateway capacity.
    /// Returns partial results on failure — successfully generated embeddings
    /// are returned alongside the error for the failed entry.
    pub async fn generate_batch(
        &mut self,
        context: &SessionContext,
        requests: Vec<EmbeddingRequest>,
    ) -> Result<Vec<EmbeddingResult>, BatchEmbeddingError>;
}

Batch error type:

/// Error from batch embedding generation.
#[derive(Debug, thiserror::Error)]
#[error("batch embedding failed at index {failed_index}: {source}")]
pub struct BatchEmbeddingError {
    /// Embeddings that were successfully generated before the failure.
    pub completed: Vec<EmbeddingResult>,
    /// Index of the request that failed.
    pub failed_index: usize,
    /// The underlying error.
    #[source]
    pub source: EmbeddingError,
}

Key implementation details for generate():

Build GenerateEmbeddingRequest with context, text, and model: None (gateway defaults to nomic-embed-text).
Call self.client.generate_embedding(request).await.
Extract embedding vector from GenerateEmbeddingResponse.
Validate embedding.len() == EMBEDDING_DIM. Return EmbeddingError::DimensionMismatch if not.
Return the Vec<f32>.

Key implementation details for generate_for_entry():

For each field (name, description, corpus):
- If text is empty, produce a zero vector vec![0.0f32; EMBEDDING_DIM] (skip the gateway call).
- Otherwise, call self.generate(context, text).await.
Collect results as Vec<EmbeddingResult>.

Key implementation details for generate_batch():

Iterate over requests sequentially.
Call self.generate(context, &request.text).await for each.
On success, push to completed Vec.
On failure, return BatchEmbeddingError with the completed Vec and the failure details.

Create services/memory/src/embedding/store.rs — Embedding storage helper:

use crate::db::{DbError, DuckDbManager};
use super::{EmbeddingResult, EmbeddingField};
use crate::db::schema::EMBEDDING_DIM;

/// Store a single embedding result in the DuckDB `embeddings` table.
///
/// Uses INSERT OR REPLACE to support updating existing embeddings.
pub fn store_embedding(
    db: &DuckDbManager,
    result: &EmbeddingResult,
) -> Result<(), DbError>;

/// Store multiple embedding results in a single transaction.
///
/// Used after `generate_for_entry()` or `generate_batch()`.
pub fn store_embeddings(
    db: &DuckDbManager,
    results: &[EmbeddingResult],
) -> Result<(), DbError>;

Key implementation details for store_embedding():

Call db.with_connection() to acquire the lock.
Format the Vec<f32> as a DuckDB array literal [v0, v1, ...]::FLOAT[768].
Execute: INSERT OR REPLACE INTO embeddings (memory_id, embedding_type, vector) VALUES (?, ?, <array_literal>::FLOAT[768]).
After insertion, call schema::ensure_hnsw_index(conn) to create the HNSW index if it was deferred.

Key implementation details for store_embeddings():

Call db.with_connection() once, execute all inserts within the same lock scope.
Use a DuckDB transaction (BEGIN; ... COMMIT;) for atomicity.
Call ensure_hnsw_index(conn) once after all inserts.

3. gRPC Handler Wiring

Update services/memory/src/service.rs — Add embedding client to MemoryServiceImpl:

use crate::embedding::EmbeddingClient;
use tokio::sync::Mutex;

pub struct MemoryServiceImpl {
    db: Arc<DuckDbManager>,
    embedding_client: Option<Arc<Mutex<EmbeddingClient>>>,
}

impl MemoryServiceImpl {
    pub fn new(db: Arc<DuckDbManager>) -> Self {
        Self {
            db,
            embedding_client: None,
        }
    }

    /// Attach an embedding client for generating embeddings on write.
    pub fn with_embedding_client(mut self, client: EmbeddingClient) -> Self {
        self.embedding_client = Some(Arc::new(Mutex::new(client)));
        self
    }
}

This follows the same builder pattern as SecretsServiceImpl::with_audit_client() in services/secrets/src/service.rs:27-33.

The write_memory() handler will remain Unimplemented for now — the full write path is a later issue. However, the embedding client is wired so that when write_memory() is implemented, it can:

Acquire the embedding client lock.
Call generate_for_entry() with the entry's name, description, and corpus.
Call store_embeddings() to persist the vectors.
Insert the memory row into the memories table.

4. Service Integration

Update services/memory/src/main.rs — Connect embedding client at startup:

use memory_service::embedding::EmbeddingClient;

// In main(), after config loading:
let mut memory_service = MemoryServiceImpl::new(db);

// Connect to Model Gateway if configured.
if let Some(ref endpoint) = config.embedding_endpoint {
    match EmbeddingClient::connect(endpoint).await {
        Ok(client) => {
            tracing::info!(endpoint = %endpoint, "Connected to Model Gateway for embeddings");
            memory_service = memory_service.with_embedding_client(client);
        }
        Err(e) => {
            tracing::warn!(
                endpoint = %endpoint,
                error = %e,
                "Model Gateway unavailable — starting without embedding support"
            );
        }
    }
}

Graceful degradation: If the Model Gateway is unreachable at startup, the memory service starts without embedding support. Write operations that require embeddings should return a clear error (e.g., Status::failed_precondition("embedding client not configured")). This matches the architecture principle that services should start independently.

Reconnection strategy: The initial implementation uses a single connection established at startup. If the gateway becomes unavailable after startup, generate() calls will fail with tonic::Status errors which are propagated as EmbeddingError::GenerationFailed. A reconnection mechanism can be added in a future issue if needed.

5. Tests

Unit tests in services/memory/src/embedding/mod.rs:

Test Case	Description
`test_embedding_field_as_db_type`	Verify `EmbeddingField::Name.as_db_type() == "name"`, etc.
`test_embedding_field_variants`	All three variants exist and are distinct
`test_dimension_mismatch_error`	Construct `DimensionMismatch` error, verify message contains expected/actual
`test_batch_error_preserves_completed`	`BatchEmbeddingError` retains successfully completed results

Unit tests in services/memory/src/embedding/store.rs:

Test Case	Description
`test_store_single_embedding`	Store one embedding, read it back via SQL, verify dimensions and values
`test_store_embeddings_batch`	Store 3 embeddings (name, desc, corpus for one entry), verify all stored
`test_store_embedding_overwrites`	Store, then store again with different vector, verify updated
`test_store_ensures_hnsw_index`	After storing, verify HNSW index exists via `duckdb_indexes()`
`test_store_empty_vector`	Store a zero vector, verify it can be stored and retrieved

Integration tests in services/memory/src/service.rs (update existing tests):

Test Case	Description
`test_service_starts_without_embedding_client`	`MemoryServiceImpl::new(db)` works without embedding client (existing behavior preserved)
`test_service_with_embedding_client`	`MemoryServiceImpl::new(db).with_embedding_client(client)` stores the client

Note on testing the gRPC client: The EmbeddingClient::generate() method calls a remote Model Gateway. For unit tests, we define an EmbeddingGenerator trait:

/// Trait for embedding generation, enabling mock implementations in tests.
#[tonic::async_trait]
pub trait EmbeddingGenerator: Send + Sync {
    async fn generate(
        &self,
        context: &SessionContext,
        text: &str,
    ) -> Result<Vec<f32>, EmbeddingError>;
}

The real EmbeddingClient implements this trait. Tests use a MockEmbeddingGenerator that returns predetermined vectors:

#[cfg(test)]
pub mod mock {
    use super::*;
    use crate::db::schema::EMBEDDING_DIM;

    pub struct MockEmbeddingGenerator {
        /// If set, all calls return this error.
        pub fail_with: Option<EmbeddingError>,
    }

    #[tonic::async_trait]
    impl EmbeddingGenerator for MockEmbeddingGenerator {
        async fn generate(
            &self,
            _context: &SessionContext,
            text: &str,
        ) -> Result<Vec<f32>, EmbeddingError> {
            if let Some(ref err) = self.fail_with {
                // Clone the error description for a new error
                return Err(EmbeddingError::GatewayUnavailable(err.to_string()));
            }
            // Return a deterministic vector based on text hash
            let hash = text.len() as f32 / 100.0;
            Ok(vec![hash; EMBEDDING_DIM])
        }
    }
}

Additional mock-based tests:

Test Case	Description
`test_generate_for_entry_all_fields`	Mock generator returns vectors for name/desc/corpus
`test_generate_for_entry_empty_fields`	Empty text produces zero vectors without calling generator
`test_generate_batch_success`	All requests succeed
`test_generate_batch_partial_failure`	Failure at index 2 returns first 2 completed results
`test_gateway_unavailable_error`	Mock returns `GatewayUnavailable`, verify error type propagation

Files to Create/Modify

File	Action	Purpose
`services/memory/src/lib.rs`	Modify	Add `pub mod embedding;`
`services/memory/src/embedding/mod.rs`	Create	`EmbeddingClient`, `EmbeddingGenerator` trait, `EmbeddingError`, `EmbeddingField`, `EmbeddingRequest`, `EmbeddingResult`, `BatchEmbeddingError`, mock module
`services/memory/src/embedding/store.rs`	Create	`store_embedding()`, `store_embeddings()` — DuckDB storage helpers
`services/memory/src/service.rs`	Modify	Add `embedding_client` field to `MemoryServiceImpl`, add `with_embedding_client()` builder
`services/memory/src/main.rs`	Modify	Connect to Model Gateway at startup if `embedding_endpoint` is configured

Risks and Edge Cases

Model Gateway not yet implemented: The Model Gateway service does not exist yet (issue #12 defined the proto, but the service itself is a later milestone). During development, the embedding client can only be tested with mocks. The real integration will be validated when the gateway is built. The memory service must start cleanly without it.
Embedding dimension drift: If the embedding model changes from nomic-embed-text (768 dims) to another model, the EMBEDDING_DIM constant and the DuckDB schema (FLOAT[768]) must both be updated via a migration. The dimension validation in generate() will catch mismatches at runtime.
Batch size limits: For very large bulk imports, sending hundreds of sequential embedding requests to the gateway may be slow. The initial implementation is sequential. Future optimization: add configurable concurrency (tokio::sync::Semaphore to limit parallel gateway calls to e.g., 4).
Empty text fields: A memory entry might have an empty description or corpus. Calling the embedding model with empty text is wasteful and may produce meaningless vectors. The plan handles this by producing zero vectors for empty fields without calling the gateway.
Gateway connection lifecycle: The tonic::transport::Channel handles connection pooling and reconnection internally. However, if the gateway is down for an extended period, gRPC calls will fail with transport errors. The current plan propagates these as EmbeddingError::GenerationFailed. Callers (the write handler) should map this to an appropriate tonic::Status (e.g., Status::unavailable).
Transaction atomicity: When writing a memory entry, the memory row and its embeddings should be written atomically. The store_embeddings() function uses a DuckDB transaction. However, the memory row insertion (in a future issue) must be coordinated with embedding storage in the same transaction. The with_connection() lock ensures exclusive access, but the caller must orchestrate the full write sequence.
Thread safety of EmbeddingClient: ModelGatewayServiceClient<Channel> is Clone and internally reference-counted, so wrapping in Arc<Mutex<>> is safe but potentially overly conservative. A Clone-based approach could avoid the lock. However, following the established pattern from the secrets service (Arc<Mutex<AuditServiceClient<Channel>>>) maintains consistency.

Deviation Log

Deviation	Reason
Added mock server tests inline in `mod tests` rather than as separate integration tests	Keeps tests co-located with the code they cover; follows same pattern as existing unit tests in the module

Retry Instructions

Failure Summary (Attempt 1)

Quality Gates:

Build: PASS
Lint (clippy): PASS
Tests: PASS (43 memory-service tests, all green)
Coverage: FAIL — embedding/mod.rs at 78.2% (52 uncovered lines in EmbeddingClient impl)

Root Cause: The EmbeddingClient struct has concrete gRPC methods (connect, generate, generate_for_entry, generate_batch) that cannot be unit-tested without a live Model Gateway server. The EmbeddingGenerator trait and MockEmbeddingGenerator cover the trait-based paths but not the concrete client code.

Required Fixes

Add a tonic mock server test in services/memory/src/embedding/mod.rs:
- Use tonic's built-in test infrastructure to create a mock ModelGatewayService server that implements the GenerateEmbedding RPC
- The mock server should return a predetermined 768-dim vector
- Write tests that create an EmbeddingClient connected to this mock server and exercise:
  - generate() — single text to embedding
  - generate_for_entry() — name/desc/corpus fields
  - generate_batch() — multiple requests
  - Error case: mock server returns an error status, verify EmbeddingError::GenerationFailed
  - Dimension mismatch: mock server returns wrong-dimension vector, verify EmbeddingError::DimensionMismatch
- This should cover the remaining 52 lines in EmbeddingClient
After fixing, run:
- cargo test --workspace to verify all tests pass
- cargo clippy --workspace -- -D warnings to verify no warnings
- cargo llvm-cov --workspace --lcov --output-path lcov.info to verify coverage >= 95% on embedding/mod.rs

22 KiB Raw Blame History