Add ConfidenceEvaluator to parse and score subtask results based on
result quality and memory candidate confidence, with configurable
aggregation strategies (weighted_mean, minimum, median).
Add ConfidenceReplanner to generate follow-up subtasks when confidence
falls below the replan threshold, with attempt tracking and max retries.
Add build_confidence_summary for human-readable confidence reporting
in final responses.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add MemoryWriteGate that evaluates subagent memory candidates against
configurable quality gates (confidence threshold, content length bounds,
structure heuristic, result quality) and writes accepted candidates to
the Memory Service with provenance tagging and audit logging.
- Create memory_gate.py with MemoryWriteGate, GatingDecision, GatingReport
- Add MemoryGatingConfig to config.py with YAML loading
- Add write_memory() to MemoryClient in clients.py
- 29 tests covering all gating rules, memory writes, tagging, audit
logging, and error handling (95% coverage)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add OrchestratorCompactor that monitors context size and automatically
compacts completed subtask results when the threshold is exceeded.
Uses Model Gateway inference for LLM-based summarization with truncation
fallback when gateway is unavailable.
- Create compaction.py with OrchestratorCompactor class
- Extend OrchestratorContext with compacted_summaries field and
get_pending_subtask_ids() method
- Add CompactionConfig to config.py with YAML loading
- Integrate compactor into SubagentDispatcher (async _safe_add_result)
- 25 tests covering threshold detection, compaction logic, gateway
interaction, context integrity, and dispatcher integration (97% coverage)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add SubagentDispatcher with dependency-aware scheduling using
asyncio.wait(FIRST_COMPLETED), semaphore concurrency control,
per-subtask and overall timeouts, transitive dependent cancellation,
and graceful error handling for partial failures.
- Create dispatcher.py with SubagentDispatcher class
- Add DispatcherConfig to config.py with YAML loading
- 23 tests covering dependency graphs, timeouts, error handling,
concurrency control, and result ordering (95%+ coverage)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add OrchestratorContext class that tracks the full orchestration state
for a ProcessRequest call: user request, decomposition plan, subtask
results, session context propagation, and agent lineage construction.
Key features:
- Factory method from ProcessRequestRequest proto
- Agent lineage chain construction (orchestrator → subagent)
- SubagentRequest builder with session config propagation
- JSON serialization/deserialization using orjson + protobuf json_format
- Context size monitoring with warning (512KB) and hard limit (1MB)
192 tests pass (44 new context tests), ruff clean, 100% coverage on
context.py.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add TaskDecomposer that uses Model Gateway Inference to decompose user
requests into subtasks with dependency graphs and agent type assignments.
Key components:
- decomposer.py: TaskDecomposer class, decomposition prompt template,
JSON parsing, validation (cycle detection via Kahn's algorithm),
proto conversion, and single-task fallback on failure
- config.py: Add DecomposerConfig with max_tokens and max_subtasks
- 42 tests covering parsing, validation, agent type mapping, proto
conversion, fallback behavior, and end-to-end decompose calls
All 148 tests pass, ruff clean, 98% coverage on decomposer.py.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add 13 e2e tests using real gRPC mock servers to validate the full
researcher agent loop through actual gRPC channels. Tests cover:
- Web search task completion with tool execution verification
- Memory query enrichment with prompt inspection
- Tool failure handling (application-level and gRPC errors)
- Context compaction triggering on long research tasks
- Confidence signal mapping (VERIFIED/INFERRED/UNCERTAIN)
- SubagentResult schema validation including memory candidates
- Graceful degradation (no tools, gateway down, memory down)
- Factory function create_researcher_agent() validation
Also adds KNOWN_LIMITATIONS.md documenting 10 known limitations
and failure modes of the researcher agent.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add context compaction to the researcher agent to handle long-running
research tasks that exceed the context window budget. When estimated
tokens exceed 60% of max_tokens, older history entries are summarized
via the Model Gateway's unary Inference RPC and replaced with a
compact bullet-point summary, preserving the 3 most recent entries.
Changes:
- clients.py: Add inference() unary method to ModelGatewayClient
- prompt.py: Add compact() method, compaction prompt template, and
_truncate_entries() fallback for gateway failures
- researcher.py: Replace hard context overflow termination with
compaction-then-continue logic
- 93 tests pass with 95%+ coverage on modified files
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add the core researcher agent: gRPC client wrappers for Model Gateway,
Tool Broker, and Memory Service; prompt builder with context window
management; JSON output parser for tool calls and done signals; and the
main agent loop with discover → infer → execute → observe cycle.
Includes termination on max iterations, timeout, context overflow, and
consecutive tool failures. 78 tests total (20 parser + 11 prompt +
12 client + 24 researcher + 11 existing), 98-100% coverage on new files.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Create the orchestrator service with gRPC server boilerplate,
YAML configuration loading, and stub ProcessRequest endpoint.
Includes 11 tests (8 config + 3 service) with full coverage.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add 13 gRPC integration tests that spin up a real ToolBrokerService
server and test the full pipeline via client:
- ExecuteTool: valid call, manifest block, path block, loop detection
- ValidateCall: allowed, denied, no side effects
- DiscoverTools: per agent type, unknown agent, override ALL
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wire the ValidateCall dry-run endpoint that runs the 5-layer
enforcement pipeline without executing the tool. Reuses the
existing enforce() method by constructing an ExecuteToolRequest
from the ValidateCallRequest. Returns is_allowed, denial_reason,
and enforcement_layer. 6 new tests.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wire the full tool execution pipeline in the ToolBrokerService:
5-layer enforcement → loop detection → credential injection →
dispatch → injection firewall → result tagging. Also wire
DiscoverTools to the discovery module and update main.rs to
construct all dependencies.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add result_tagger module that wraps tool outputs with provenance
metadata (tool name, execution time, agent/session IDs, trust level).
Trust classification: Internal (memory, inference), External (web, fs,
shell), Unknown. Tagging does not modify actual tool result content.
13 unit tests.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add heuristic scanner for common prompt injection patterns in tool
results. Supports three sensitivity levels (Low/Medium/High) with
configurable sanitization. Detects role manipulation, delimiter
injection, jailbreak attempts, and system prompt extraction. 19 tests.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add CredentialInjector that fetches secrets from the Secrets Service
at tool execution time and injects them into parameters. Credentials
are never logged or returned to agents. Uses __credential parameter
key for injection. 9 tests with mock gRPC server.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add discovery module with builtin tool definitions for all well-known
tools (web_search, memory_read/write, fs_read/write, run_code/shell,
package_install, inference, generate_embedding). Filters by agent
manifest and session overrides, returns ToolDefinition with parameter
schemas. 11 unit tests.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add ToolDispatcher with dispatch table mapping tool names to executors.
Three executor types: InternalExecutor (async functions), SubprocessExecutor
(command with stdout/stderr capture), GrpcExecutor (placeholder for gRPC
forwarding). Includes timeout enforcement via tokio::time::timeout and
execution metadata (duration, exit code, success flag).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add enforcement layer 5 that verifies network destinations in tool
parameters against agent type allowed egress patterns. Supports exact
domain matching and wildcard subdomain patterns (*.example.com).
Prevents data exfiltration by restricting agent network access.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add enforcement layer 4 that verifies file-system paths in tool
parameters against agent type path allowlist glob patterns. Includes
logical path canonicalization to prevent directory traversal attacks.
Uses glob-match crate for pattern matching.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add enforcement layer 3 that verifies agent lineage chains to prevent
privilege escalation through agent spawning. Checks that each parent
in the chain has permission to spawn its child and that spawn depth
limits are respected.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add agent_manifest enforcement layer that verifies the requested tool
is in the calling agent type's allowed tool list from the manifest.
Denies with clear reason if no manifest found or tool not permitted.
7 tests covering allowed/denied tools, cross-type checks, unknown
agents, empty tools list.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add session override enforcement layer that checks OverrideLevel from
SessionContext: ALL bypasses all enforcement, RELAX grants tools but
preserves lineage checks, NONE/UNSPECIFIED applies full manifest
enforcement. Returns typed SessionOverrideResult enum for downstream
layers. 8 tests covering all override levels and edge cases.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add ManifestStore that loads TOML agent type manifests from a directory.
Each manifest defines allowed tools, path allowlists, network egress
policies, lineage constraints (can_spawn), and max spawn depth.
Includes validation, reload support, and lookup by ID or name.
14 manifest tests + 8 existing = 22 total, clippy clean.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Create the Tool Broker service skeleton as a Cargo workspace member:
- Tonic gRPC server with DiscoverTools, ExecuteTool, ValidateCall stubs
- TOML config loading (host, port, manifest_dir, audit/secrets addrs)
- Server-streaming support for ExecuteTool via ReceiverStream
- 8 tests (5 config, 3 service stub) passing, clippy clean
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
8 integration tests wiring real service components with mocked external
services (SearXNG via aioresponses, Model Gateway/Audit via mock gRPC
servers). Tests cover: full pipeline with all fields populated, clean
text extraction, summarization, unreachable URL handling, audit logging,
SearXNG unavailability, result ordering, and Model Gateway fallback.
Total: 71 tests passing across the Search Service.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wire the Search RPC handler to orchestrate the full search pipeline:
SearXNG query → content extraction → Model Gateway summarization.
Supports configurable pipeline stages (extraction/summarization can
be disabled), audit logging via Audit Service, and graceful degradation
at each stage. 14 tests covering full pipeline, partial pipelines,
validation, error handling, and audit logging.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>