22 KiB
Implementation Plan — Issue #29: Implement embedding generation integration
Metadata
| Field | Value |
|---|---|
| Issue | #29 |
| Title | Implement embedding generation integration |
| Milestone | Phase 4: Memory Service |
| Labels | service:memory, lang:rust |
| Status | COMPLETED |
| Language | Rust |
| Related Plans | issue-011.md, issue-012.md, issue-027.md, issue-028.md |
| Blocked by | #28 (completed) |
Acceptance Criteria
- gRPC client to Model Gateway
GenerateEmbedding - Embedding generated for each new memory entry
- Embedding stored alongside memory in DuckDB
- Batch embedding support for bulk imports
- Graceful handling of Model Gateway unavailability
Architecture Analysis
Service Context
This issue belongs to the Memory Service (Rust). It integrates the Memory Service with the Model Gateway by implementing a gRPC client that calls GenerateEmbedding to produce 768-dimensional vectors (nomic-embed-text) for memory content at write time.
The Model Gateway proto (proto/llm_multiverse/v1/model_gateway.proto) defines:
GenerateEmbedding(GenerateEmbeddingRequest) returns (GenerateEmbeddingResponse)— unary RPCGenerateEmbeddingRequestcontainsSessionContext context,string text, andoptional string modelGenerateEmbeddingResponsecontainsrepeated float embeddinganduint32 dimensions
The Memory proto (proto/llm_multiverse/v1/memory.proto) defines MemoryEntry with embedding fields: bytes name_embedding, bytes description_embedding, bytes corpus_embedding. Per the architecture doc, each memory entry requires three embeddings (name, description, corpus) generated via nomic-embed-text.
The DuckDB schema (from issue #28) stores embeddings in the embeddings table with columns (memory_id, embedding_type, vector FLOAT[768]) where embedding_type is one of 'name', 'description', 'corpus'.
Existing Patterns
- gRPC client pattern: The secrets service uses
AuditServiceClient<Channel>wrapped inArc<Mutex<>>for calling the audit service (seeservices/secrets/src/service.rs:16-17). The client is optional and configured via a builder methodwith_audit_client(). This same pattern applies here — the Model Gateway client should be optional to allow the memory service to start without the gateway (degraded mode). - Config:
services/memory/src/config.rsalready hasembedding_endpoint: Option<String>for the Model Gateway address (line 14). - DuckDB access:
DuckDbManagerwraps the connection inMutex<Connection>and exposeswith_connection()(seeservices/memory/src/db/mod.rs:84-90). - Error types:
DbErrorenum inservices/memory/src/db/mod.rs:14-39usesthiserror. A new error variant or a separate embedding error type is needed.
Dependencies
- Crate:
tonic— already a dependency, providestransport::Channelandtransport::Endpointfor the gRPC client. - Proto-gen crate — already provides
model_gateway_service_client::ModelGatewayServiceClient(client stubs are generated viabuild_client(true)ingen/rust/build.rs:16). - No new crate dependencies — all necessary types are already available via existing dependencies (
tonic,prost,llm-multiverse-proto).
Cross-Service Integration
- Model Gateway (
GenerateEmbeddingRPC) — the memory service becomes a gRPC client of the model gateway. The gateway must be running and reachable at the configuredembedding_endpointaddress. - DuckDB — embeddings are stored in the existing
embeddingstable after generation.
Implementation Steps
1. Types & Configuration
Define embedding-specific error types in a new services/memory/src/embedding/mod.rs:
#[derive(Debug, thiserror::Error)]
pub enum EmbeddingError {
/// The Model Gateway is unavailable or unreachable.
#[error("Model Gateway unavailable: {0}")]
GatewayUnavailable(String),
/// The gRPC call to GenerateEmbedding failed.
#[error("embedding generation failed: {0}")]
GenerationFailed(#[from] tonic::Status),
/// The returned embedding has an unexpected dimension.
#[error("dimension mismatch: expected {expected}, got {actual}")]
DimensionMismatch { expected: usize, actual: usize },
/// Connection to Model Gateway failed.
#[error("connection error: {0}")]
ConnectionError(#[from] tonic::transport::Error),
}
Define the embedding field enum for type safety:
/// Identifies which field of a memory entry an embedding belongs to.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum EmbeddingField {
Name,
Description,
Corpus,
}
impl EmbeddingField {
/// Returns the DuckDB `embedding_type` column value.
pub fn as_db_type(&self) -> &'static str {
match self {
Self::Name => "name",
Self::Description => "description",
Self::Corpus => "corpus",
}
}
}
Define a batch embedding request struct:
/// A request to generate an embedding for a specific text and field.
pub struct EmbeddingRequest {
pub memory_id: String,
pub field: EmbeddingField,
pub text: String,
}
/// A completed embedding ready for storage.
pub struct EmbeddingResult {
pub memory_id: String,
pub field: EmbeddingField,
pub vector: Vec<f32>,
}
No config changes needed — Config::embedding_endpoint already exists in services/memory/src/config.rs:14.
2. Core Logic
Create services/memory/src/embedding/mod.rs — Embedding client:
The core struct wraps the ModelGatewayServiceClient:
use llm_multiverse_proto::llm_multiverse::v1::{
model_gateway_service_client::ModelGatewayServiceClient,
GenerateEmbeddingRequest, SessionContext,
};
use tonic::transport::Channel;
use crate::db::schema::EMBEDDING_DIM;
pub struct EmbeddingClient {
client: ModelGatewayServiceClient<Channel>,
}
impl EmbeddingClient {
/// Connect to the Model Gateway at the given endpoint.
pub async fn connect(endpoint: &str) -> Result<Self, EmbeddingError>;
/// Generate an embedding for a single text string.
///
/// Calls `GenerateEmbedding` on the Model Gateway with `model` set to None
/// (defaults to nomic-embed-text on the gateway side).
/// Validates that the returned vector has `EMBEDDING_DIM` dimensions.
pub async fn generate(
&mut self,
context: &SessionContext,
text: &str,
) -> Result<Vec<f32>, EmbeddingError>;
/// Generate embeddings for all three fields of a memory entry.
///
/// Returns a Vec of 3 `EmbeddingResult`s (name, description, corpus).
/// Calls are made sequentially to avoid overloading the gateway.
/// If a field text is empty, a zero vector is stored.
pub async fn generate_for_entry(
&mut self,
context: &SessionContext,
memory_id: &str,
name: &str,
description: &str,
corpus: &str,
) -> Result<Vec<EmbeddingResult>, EmbeddingError>;
/// Generate embeddings for a batch of entries (bulk import support).
///
/// Processes entries sequentially to respect gateway capacity.
/// Returns partial results on failure — successfully generated embeddings
/// are returned alongside the error for the failed entry.
pub async fn generate_batch(
&mut self,
context: &SessionContext,
requests: Vec<EmbeddingRequest>,
) -> Result<Vec<EmbeddingResult>, BatchEmbeddingError>;
}
Batch error type:
/// Error from batch embedding generation.
#[derive(Debug, thiserror::Error)]
#[error("batch embedding failed at index {failed_index}: {source}")]
pub struct BatchEmbeddingError {
/// Embeddings that were successfully generated before the failure.
pub completed: Vec<EmbeddingResult>,
/// Index of the request that failed.
pub failed_index: usize,
/// The underlying error.
#[source]
pub source: EmbeddingError,
}
Key implementation details for generate():
- Build
GenerateEmbeddingRequestwithcontext,text, andmodel: None(gateway defaults to nomic-embed-text). - Call
self.client.generate_embedding(request).await. - Extract
embeddingvector fromGenerateEmbeddingResponse. - Validate
embedding.len() == EMBEDDING_DIM. ReturnEmbeddingError::DimensionMismatchif not. - Return the
Vec<f32>.
Key implementation details for generate_for_entry():
- For each field (name, description, corpus):
- If text is empty, produce a zero vector
vec. - Otherwise, call
self.generate(context, text).await.
- If text is empty, produce a zero vector
- Collect results as
Vec<EmbeddingResult>.
Key implementation details for generate_batch():
- Iterate over
requestssequentially. - Call
self.generate(context, &request.text).awaitfor each. - On success, push to
completedVec. - On failure, return
BatchEmbeddingErrorwith thecompletedVec and the failure details.
Create services/memory/src/embedding/store.rs — Embedding storage helper:
use crate::db::{DbError, DuckDbManager};
use super::{EmbeddingResult, EmbeddingField};
use crate::db::schema::EMBEDDING_DIM;
/// Store a single embedding result in the DuckDB `embeddings` table.
///
/// Uses INSERT OR REPLACE to support updating existing embeddings.
pub fn store_embedding(
db: &DuckDbManager,
result: &EmbeddingResult,
) -> Result<(), DbError>;
/// Store multiple embedding results in a single transaction.
///
/// Used after `generate_for_entry()` or `generate_batch()`.
pub fn store_embeddings(
db: &DuckDbManager,
results: &[EmbeddingResult],
) -> Result<(), DbError>;
Key implementation details for store_embedding():
- Call
db.with_connection()to acquire the lock. - Format the
Vec<f32>as a DuckDB array literal[v0, v1, ...]::FLOAT[768]. - Execute:
INSERT OR REPLACE INTO embeddings (memory_id, embedding_type, vector) VALUES (?, ?, <array_literal>::FLOAT[768]). - After insertion, call
schema::ensure_hnsw_index(conn)to create the HNSW index if it was deferred.
Key implementation details for store_embeddings():
- Call
db.with_connection()once, execute all inserts within the same lock scope. - Use a DuckDB transaction (
BEGIN; ... COMMIT;) for atomicity. - Call
ensure_hnsw_index(conn)once after all inserts.
3. gRPC Handler Wiring
Update services/memory/src/service.rs — Add embedding client to MemoryServiceImpl:
use crate::embedding::EmbeddingClient;
use tokio::sync::Mutex;
pub struct MemoryServiceImpl {
db: Arc<DuckDbManager>,
embedding_client: Option<Arc<Mutex<EmbeddingClient>>>,
}
impl MemoryServiceImpl {
pub fn new(db: Arc<DuckDbManager>) -> Self {
Self {
db,
embedding_client: None,
}
}
/// Attach an embedding client for generating embeddings on write.
pub fn with_embedding_client(mut self, client: EmbeddingClient) -> Self {
self.embedding_client = Some(Arc::new(Mutex::new(client)));
self
}
}
This follows the same builder pattern as SecretsServiceImpl::with_audit_client() in services/secrets/src/service.rs:27-33.
The write_memory() handler will remain Unimplemented for now — the full write path is a later issue. However, the embedding client is wired so that when write_memory() is implemented, it can:
- Acquire the embedding client lock.
- Call
generate_for_entry()with the entry's name, description, and corpus. - Call
store_embeddings()to persist the vectors. - Insert the memory row into the
memoriestable.
4. Service Integration
Update services/memory/src/main.rs — Connect embedding client at startup:
use memory_service::embedding::EmbeddingClient;
// In main(), after config loading:
let mut memory_service = MemoryServiceImpl::new(db);
// Connect to Model Gateway if configured.
if let Some(ref endpoint) = config.embedding_endpoint {
match EmbeddingClient::connect(endpoint).await {
Ok(client) => {
tracing::info!(endpoint = %endpoint, "Connected to Model Gateway for embeddings");
memory_service = memory_service.with_embedding_client(client);
}
Err(e) => {
tracing::warn!(
endpoint = %endpoint,
error = %e,
"Model Gateway unavailable — starting without embedding support"
);
}
}
}
Graceful degradation: If the Model Gateway is unreachable at startup, the memory service starts without embedding support. Write operations that require embeddings should return a clear error (e.g., Status::failed_precondition("embedding client not configured")). This matches the architecture principle that services should start independently.
Reconnection strategy: The initial implementation uses a single connection established at startup. If the gateway becomes unavailable after startup, generate() calls will fail with tonic::Status errors which are propagated as EmbeddingError::GenerationFailed. A reconnection mechanism can be added in a future issue if needed.
5. Tests
Unit tests in services/memory/src/embedding/mod.rs:
| Test Case | Description |
|---|---|
test_embedding_field_as_db_type |
Verify EmbeddingField::Name.as_db_type() == "name", etc. |
test_embedding_field_variants |
All three variants exist and are distinct |
test_dimension_mismatch_error |
Construct DimensionMismatch error, verify message contains expected/actual |
test_batch_error_preserves_completed |
BatchEmbeddingError retains successfully completed results |
Unit tests in services/memory/src/embedding/store.rs:
| Test Case | Description |
|---|---|
test_store_single_embedding |
Store one embedding, read it back via SQL, verify dimensions and values |
test_store_embeddings_batch |
Store 3 embeddings (name, desc, corpus for one entry), verify all stored |
test_store_embedding_overwrites |
Store, then store again with different vector, verify updated |
test_store_ensures_hnsw_index |
After storing, verify HNSW index exists via duckdb_indexes() |
test_store_empty_vector |
Store a zero vector, verify it can be stored and retrieved |
Integration tests in services/memory/src/service.rs (update existing tests):
| Test Case | Description |
|---|---|
test_service_starts_without_embedding_client |
MemoryServiceImpl::new(db) works without embedding client (existing behavior preserved) |
test_service_with_embedding_client |
MemoryServiceImpl::new(db).with_embedding_client(client) stores the client |
Note on testing the gRPC client: The EmbeddingClient::generate() method calls a remote Model Gateway. For unit tests, we define an EmbeddingGenerator trait:
/// Trait for embedding generation, enabling mock implementations in tests.
#[tonic::async_trait]
pub trait EmbeddingGenerator: Send + Sync {
async fn generate(
&self,
context: &SessionContext,
text: &str,
) -> Result<Vec<f32>, EmbeddingError>;
}
The real EmbeddingClient implements this trait. Tests use a MockEmbeddingGenerator that returns predetermined vectors:
#[cfg(test)]
pub mod mock {
use super::*;
use crate::db::schema::EMBEDDING_DIM;
pub struct MockEmbeddingGenerator {
/// If set, all calls return this error.
pub fail_with: Option<EmbeddingError>,
}
#[tonic::async_trait]
impl EmbeddingGenerator for MockEmbeddingGenerator {
async fn generate(
&self,
_context: &SessionContext,
text: &str,
) -> Result<Vec<f32>, EmbeddingError> {
if let Some(ref err) = self.fail_with {
// Clone the error description for a new error
return Err(EmbeddingError::GatewayUnavailable(err.to_string()));
}
// Return a deterministic vector based on text hash
let hash = text.len() as f32 / 100.0;
Ok(vec![hash; EMBEDDING_DIM])
}
}
}
Additional mock-based tests:
| Test Case | Description |
|---|---|
test_generate_for_entry_all_fields |
Mock generator returns vectors for name/desc/corpus |
test_generate_for_entry_empty_fields |
Empty text produces zero vectors without calling generator |
test_generate_batch_success |
All requests succeed |
test_generate_batch_partial_failure |
Failure at index 2 returns first 2 completed results |
test_gateway_unavailable_error |
Mock returns GatewayUnavailable, verify error type propagation |
Files to Create/Modify
| File | Action | Purpose |
|---|---|---|
services/memory/src/lib.rs |
Modify | Add pub mod embedding; |
services/memory/src/embedding/mod.rs |
Create | EmbeddingClient, EmbeddingGenerator trait, EmbeddingError, EmbeddingField, EmbeddingRequest, EmbeddingResult, BatchEmbeddingError, mock module |
services/memory/src/embedding/store.rs |
Create | store_embedding(), store_embeddings() — DuckDB storage helpers |
services/memory/src/service.rs |
Modify | Add embedding_client field to MemoryServiceImpl, add with_embedding_client() builder |
services/memory/src/main.rs |
Modify | Connect to Model Gateway at startup if embedding_endpoint is configured |
Risks and Edge Cases
- Model Gateway not yet implemented: The Model Gateway service does not exist yet (issue #12 defined the proto, but the service itself is a later milestone). During development, the embedding client can only be tested with mocks. The real integration will be validated when the gateway is built. The memory service must start cleanly without it.
- Embedding dimension drift: If the embedding model changes from nomic-embed-text (768 dims) to another model, the
EMBEDDING_DIMconstant and the DuckDB schema (FLOAT[768]) must both be updated via a migration. The dimension validation ingenerate()will catch mismatches at runtime. - Batch size limits: For very large bulk imports, sending hundreds of sequential embedding requests to the gateway may be slow. The initial implementation is sequential. Future optimization: add configurable concurrency (
tokio::sync::Semaphoreto limit parallel gateway calls to e.g., 4). - Empty text fields: A memory entry might have an empty description or corpus. Calling the embedding model with empty text is wasteful and may produce meaningless vectors. The plan handles this by producing zero vectors for empty fields without calling the gateway.
- Gateway connection lifecycle: The
tonic::transport::Channelhandles connection pooling and reconnection internally. However, if the gateway is down for an extended period, gRPC calls will fail with transport errors. The current plan propagates these asEmbeddingError::GenerationFailed. Callers (the write handler) should map this to an appropriatetonic::Status(e.g.,Status::unavailable). - Transaction atomicity: When writing a memory entry, the memory row and its embeddings should be written atomically. The
store_embeddings()function uses a DuckDB transaction. However, the memory row insertion (in a future issue) must be coordinated with embedding storage in the same transaction. Thewith_connection()lock ensures exclusive access, but the caller must orchestrate the full write sequence. - Thread safety of
EmbeddingClient:ModelGatewayServiceClient<Channel>isCloneand internally reference-counted, so wrapping inArc<Mutex<>>is safe but potentially overly conservative. AClone-based approach could avoid the lock. However, following the established pattern from the secrets service (Arc<Mutex<AuditServiceClient<Channel>>>) maintains consistency.
Deviation Log
| Deviation | Reason |
|---|---|
Added mock server tests inline in mod tests rather than as separate integration tests |
Keeps tests co-located with the code they cover; follows same pattern as existing unit tests in the module |
Retry Instructions
Failure Summary (Attempt 1)
Quality Gates:
- Build: PASS
- Lint (clippy): PASS
- Tests: PASS (43 memory-service tests, all green)
- Coverage: FAIL —
embedding/mod.rsat 78.2% (52 uncovered lines inEmbeddingClientimpl)
Root Cause: The EmbeddingClient struct has concrete gRPC methods (connect, generate, generate_for_entry, generate_batch) that cannot be unit-tested without a live Model Gateway server. The EmbeddingGenerator trait and MockEmbeddingGenerator cover the trait-based paths but not the concrete client code.
Required Fixes
-
Add a tonic mock server test in
services/memory/src/embedding/mod.rs:- Use
tonic's built-in test infrastructure to create a mockModelGatewayServiceserver that implements theGenerateEmbeddingRPC - The mock server should return a predetermined 768-dim vector
- Write tests that create an
EmbeddingClientconnected to this mock server and exercise:generate()— single text to embeddinggenerate_for_entry()— name/desc/corpus fieldsgenerate_batch()— multiple requests- Error case: mock server returns an error status, verify
EmbeddingError::GenerationFailed - Dimension mismatch: mock server returns wrong-dimension vector, verify
EmbeddingError::DimensionMismatch
- This should cover the remaining 52 lines in
EmbeddingClient
- Use
-
After fixing, run:
cargo test --workspaceto verify all tests passcargo clippy --workspace -- -D warningsto verify no warningscargo llvm-cov --workspace --lcov --output-path lcov.infoto verify coverage >= 95% onembedding/mod.rs