Commit Graph

180 Commits

Author SHA1 Message Date
Pi Agent
e5614825de feat: implement confidence signal handling (issue #78)
Add ConfidenceEvaluator to parse and score subtask results based on
result quality and memory candidate confidence, with configurable
aggregation strategies (weighted_mean, minimum, median).

Add ConfidenceReplanner to generate follow-up subtasks when confidence
falls below the replan threshold, with attempt tracking and max retries.

Add build_confidence_summary for human-readable confidence reporting
in final responses.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 21:39:24 +01:00
Pi Agent
60b5266666 feat: implement memory write gating (issue #77)
Add MemoryWriteGate that evaluates subagent memory candidates against
configurable quality gates (confidence threshold, content length bounds,
structure heuristic, result quality) and writes accepted candidates to
the Memory Service with provenance tagging and audit logging.

- Create memory_gate.py with MemoryWriteGate, GatingDecision, GatingReport
- Add MemoryGatingConfig to config.py with YAML loading
- Add write_memory() to MemoryClient in clients.py
- 29 tests covering all gating rules, memory writes, tagging, audit
  logging, and error handling (95% coverage)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 21:14:10 +01:00
Pi Agent
45c572da5e feat: implement rolling context compaction (issue #76)
Add OrchestratorCompactor that monitors context size and automatically
compacts completed subtask results when the threshold is exceeded.
Uses Model Gateway inference for LLM-based summarization with truncation
fallback when gateway is unavailable.

- Create compaction.py with OrchestratorCompactor class
- Extend OrchestratorContext with compacted_summaries field and
  get_pending_subtask_ids() method
- Add CompactionConfig to config.py with YAML loading
- Integrate compactor into SubagentDispatcher (async _safe_add_result)
- 25 tests covering threshold detection, compaction logic, gateway
  interaction, context integrity, and dispatcher integration (97% coverage)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 21:04:11 +01:00
Pi Agent
6677e8add6 feat: implement parallel dispatch via asyncio (issue #74)
Add SubagentDispatcher with dependency-aware scheduling using
asyncio.wait(FIRST_COMPLETED), semaphore concurrency control,
per-subtask and overall timeouts, transitive dependent cancellation,
and graceful error handling for partial failures.

- Create dispatcher.py with SubagentDispatcher class
- Add DispatcherConfig to config.py with YAML loading
- 23 tests covering dependency graphs, timeouts, error handling,
  concurrency control, and result ordering (95%+ coverage)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 20:52:23 +01:00
Pi Agent
f84ed9ffca feat: implement orchestrator context management (issue #75)
Add OrchestratorContext class that tracks the full orchestration state
for a ProcessRequest call: user request, decomposition plan, subtask
results, session context propagation, and agent lineage construction.

Key features:
- Factory method from ProcessRequestRequest proto
- Agent lineage chain construction (orchestrator → subagent)
- SubagentRequest builder with session config propagation
- JSON serialization/deserialization using orjson + protobuf json_format
- Context size monitoring with warning (512KB) and hard limit (1MB)

192 tests pass (44 new context tests), ruff clean, 100% coverage on
context.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 20:43:21 +01:00
Pi Agent
58746687ff feat: implement task decomposition for orchestrator (issue #73)
Add TaskDecomposer that uses Model Gateway Inference to decompose user
requests into subtasks with dependency graphs and agent type assignments.

Key components:
- decomposer.py: TaskDecomposer class, decomposition prompt template,
  JSON parsing, validation (cycle detection via Kahn's algorithm),
  proto conversion, and single-task fallback on failure
- config.py: Add DecomposerConfig with max_tokens and max_subtasks
- 42 tests covering parsing, validation, agent type mapping, proto
  conversion, fallback behavior, and end-to-end decompose calls

All 148 tests pass, ruff clean, 98% coverage on decomposer.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 20:37:01 +01:00
Pi Agent
6354f877f5 test: add end-to-end validation for researcher agent (issue #71)
Add 13 e2e tests using real gRPC mock servers to validate the full
researcher agent loop through actual gRPC channels. Tests cover:
- Web search task completion with tool execution verification
- Memory query enrichment with prompt inspection
- Tool failure handling (application-level and gRPC errors)
- Context compaction triggering on long research tasks
- Confidence signal mapping (VERIFIED/INFERRED/UNCERTAIN)
- SubagentResult schema validation including memory candidates
- Graceful degradation (no tools, gateway down, memory down)
- Factory function create_researcher_agent() validation

Also adds KNOWN_LIMITATIONS.md documenting 10 known limitations
and failure modes of the researcher agent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 20:30:04 +01:00
Pi Agent
6f89b3f83d docs: mark issue #70 as COMPLETED in plan index
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 20:22:50 +01:00
Pi Agent
98f4e01d18 feat: implement context compaction for subagent prompt (issue #70)
Add context compaction to the researcher agent to handle long-running
research tasks that exceed the context window budget. When estimated
tokens exceed 60% of max_tokens, older history entries are summarized
via the Model Gateway's unary Inference RPC and replaced with a
compact bullet-point summary, preserving the 3 most recent entries.

Changes:
- clients.py: Add inference() unary method to ModelGatewayClient
- prompt.py: Add compact() method, compaction prompt template, and
  _truncate_entries() fallback for gateway failures
- researcher.py: Replace hard context overflow termination with
  compaction-then-continue logic
- 93 tests pass with 95%+ coverage on modified files

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 20:22:02 +01:00
Pi Agent
41da7f866b feat: implement researcher agent loop with tool use cycle (issue #69)
Add the core researcher agent: gRPC client wrappers for Model Gateway,
Tool Broker, and Memory Service; prompt builder with context window
management; JSON output parser for tool calls and done signals; and the
main agent loop with discover → infer → execute → observe cycle.

Includes termination on max iterations, timeout, context overflow, and
consecutive tool failures. 78 tests total (20 parser + 11 prompt +
12 client + 24 researcher + 11 existing), 98-100% coverage on new files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 20:08:58 +01:00
54034b1a38 Merge pull request 'feat: scaffold Orchestrator Python project (#72)' (#167) from feature/issue-72-scaffold-orchestrator into main 2026-03-10 17:06:47 +01:00
Pi Agent
32f43e0f22 feat: scaffold orchestrator Python project (issue #72)
Create the orchestrator service with gRPC server boilerplate,
YAML configuration loading, and stub ProcessRequest endpoint.
Includes 11 tests (8 config + 3 service) with full coverage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 17:06:20 +01:00
ac41f41480 Merge pull request 'test: add end-to-end integration tests for Tool Broker (#67)' (#165) from feature/issue-67-integration-tests into main 2026-03-10 16:54:49 +01:00
Pi Agent
0a986f3e5c test: add end-to-end integration tests for Tool Broker (issue #67)
Add 13 gRPC integration tests that spin up a real ToolBrokerService
server and test the full pipeline via client:
- ExecuteTool: valid call, manifest block, path block, loop detection
- ValidateCall: allowed, denied, no side effects
- DiscoverTools: per agent type, unknown agent, override ALL

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 16:54:34 +01:00
0e5d3f9c40 Merge pull request 'test: add edge case unit tests for enforcement layers (#66)' (#164) from feature/issue-66-enforcement-unit-tests into main 2026-03-10 16:51:43 +01:00
Pi Agent
640bbcc9bb test: add edge case unit tests for all enforcement layers (issue #66)
Add 21 new edge case tests across all 5 enforcement layers:
- Session override: invalid level, empty tool, case sensitivity
- Agent manifest: case sensitivity, negative ID, empty tools
- Lineage: self-spawn, zero depth, unknown child type
- Path allowlist: relative path, traversal, trailing slash
- Network egress: IP address, malformed URL, localhost

Total enforcement tests: 74 (was 53). Overall: 186 tests pass.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 16:51:27 +01:00
8965e849ae Merge pull request 'feat: implement ValidateCall gRPC endpoint (#65)' (#163) from feature/issue-65-validate-call-endpoint into main 2026-03-10 16:47:56 +01:00
Pi Agent
e9cb88eb28 feat: implement ValidateCall gRPC endpoint (issue #65)
Wire the ValidateCall dry-run endpoint that runs the 5-layer
enforcement pipeline without executing the tool. Reuses the
existing enforce() method by constructing an ExecuteToolRequest
from the ValidateCallRequest. Returns is_allowed, denial_reason,
and enforcement_layer. 6 new tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 16:47:36 +01:00
966c6eeaa8 Merge pull request 'feat: implement ExecuteTool gRPC endpoint (#64)' (#162) from feature/issue-64-execute-tool-endpoint into main 2026-03-10 16:43:54 +01:00
Pi Agent
1bba1ad35d feat: implement ExecuteTool gRPC endpoint (issue #64)
Wire the full tool execution pipeline in the ToolBrokerService:
5-layer enforcement → loop detection → credential injection →
dispatch → injection firewall → result tagging. Also wire
DiscoverTools to the discovery module and update main.rs to
construct all dependencies.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 16:43:32 +01:00
2d7ef7a3d8 Merge pull request 'feat: implement tool result tagging (#63)' (#161) from feature/issue-63-result-tagging into main 2026-03-10 16:35:46 +01:00
Pi Agent
9d0d35f1bc feat: implement tool result tagging (issue #63)
Add result_tagger module that wraps tool outputs with provenance
metadata (tool name, execution time, agent/session IDs, trust level).
Trust classification: Internal (memory, inference), External (web, fs,
shell), Unknown. Tagging does not modify actual tool result content.
13 unit tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 16:35:29 +01:00
cc8b024dfb Merge pull request 'feat: implement prompt injection firewall (#62)' (#160) from feature/issue-62-injection-firewall into main 2026-03-10 16:33:23 +01:00
Pi Agent
a243030dd0 feat: implement prompt injection firewall (issue #62)
Add heuristic scanner for common prompt injection patterns in tool
results. Supports three sensitivity levels (Low/Medium/High) with
configurable sanitization. Detects role manipulation, delimiter
injection, jailbreak attempts, and system prompt extraction. 19 tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 16:33:08 +01:00
492db4051a Merge pull request 'feat: implement credential injection (#61)' (#159) from feature/issue-61-credential-injection into main 2026-03-10 16:30:19 +01:00
Pi Agent
8ea30b813c feat: implement credential injection (issue #61)
Add CredentialInjector that fetches secrets from the Secrets Service
at tool execution time and injects them into parameters. Credentials
are never logged or returned to agents. Uses __credential parameter
key for injection. 9 tests with mock gRPC server.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 16:29:57 +01:00
08eca54c72 Merge pull request 'feat: implement loop and thrash detection (#60)' (#158) from feature/issue-60-loop-detection into main 2026-03-10 16:26:38 +01:00
Pi Agent
baac330fd2 feat: implement loop and thrash detection (issue #60)
Add LoopDetector with per-session/agent sliding window tracking.
Detects exact-match loops (same tool + args → block), near-match
loops (same tool, varying args → warning), and thrash patterns
(A→B→A→B alternation → warning). Configurable thresholds for
window size, max repeats, and thrash cycles.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 16:26:23 +01:00
72041378fc Merge pull request 'feat: implement tool discovery logic (#59)' (#157) from feature/issue-59-tool-discovery into main 2026-03-10 16:23:00 +01:00
Pi Agent
fe65ba6411 feat: implement tool discovery logic (issue #59)
Add discovery module with builtin tool definitions for all well-known
tools (web_search, memory_read/write, fs_read/write, run_code/shell,
package_install, inference, generate_embedding). Filters by agent
manifest and session overrides, returns ToolDefinition with parameter
schemas. 11 unit tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 16:22:34 +01:00
41514d9726 Merge pull request 'feat: implement tool execution dispatch (#58)' (#156) from feature/issue-58-tool-dispatch into main 2026-03-10 16:19:11 +01:00
Pi Agent
c12698faf5 feat: implement tool execution dispatch (issue #58)
Add ToolDispatcher with dispatch table mapping tool names to executors.
Three executor types: InternalExecutor (async functions), SubprocessExecutor
(command with stdout/stderr capture), GrpcExecutor (placeholder for gRPC
forwarding). Includes timeout enforcement via tokio::time::timeout and
execution metadata (duration, exit code, success flag).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 16:18:51 +01:00
637f8b6fb2 Merge pull request 'feat: implement network egress enforcement (#57)' (#155) from feature/issue-57-network-egress into main 2026-03-10 16:15:14 +01:00
Pi Agent
17e3e46889 feat: implement network egress enforcement layer (issue #57)
Add enforcement layer 5 that verifies network destinations in tool
parameters against agent type allowed egress patterns. Supports exact
domain matching and wildcard subdomain patterns (*.example.com).
Prevents data exfiltration by restricting agent network access.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 16:14:56 +01:00
2f4dbb05a4 Merge pull request 'feat: implement path allowlist enforcement (#56)' (#154) from feature/issue-56-path-allowlist into main 2026-03-10 16:12:35 +01:00
Pi Agent
2953997e28 feat: implement path allowlist enforcement layer (issue #56)
Add enforcement layer 4 that verifies file-system paths in tool
parameters against agent type path allowlist glob patterns. Includes
logical path canonicalization to prevent directory traversal attacks.
Uses glob-match crate for pattern matching.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 16:12:11 +01:00
fc892c59bb Merge pull request 'feat: implement lineage constraint enforcement (#55)' (#153) from feature/issue-55-lineage-constraint into main 2026-03-10 16:08:50 +01:00
Pi Agent
253926c898 feat: implement lineage constraint enforcement layer (issue #55)
Add enforcement layer 3 that verifies agent lineage chains to prevent
privilege escalation through agent spawning. Checks that each parent
in the chain has permission to spawn its child and that spawn depth
limits are respected.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 16:08:28 +01:00
90f08dcdc7 Merge pull request 'feat: enforcement layer 2 — agent type manifest check (#54)' (#152) from feature/issue-54-agent-manifest-check into main 2026-03-10 16:04:56 +01:00
Pi Agent
bfce35ed22 feat: implement enforcement layer 2 — agent type manifest check (issue #54)
Add agent_manifest enforcement layer that verifies the requested tool
is in the calling agent type's allowed tool list from the manifest.
Denies with clear reason if no manifest found or tool not permitted.
7 tests covering allowed/denied tools, cross-type checks, unknown
agents, empty tools list.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 16:04:43 +01:00
b3f5fe2576 Merge pull request 'feat: enforcement layer 1 — session override check (#53)' (#151) from feature/issue-53-session-override-check into main 2026-03-10 16:02:58 +01:00
Pi Agent
f2fedbf013 feat: implement enforcement layer 1 — session override check (issue #53)
Add session override enforcement layer that checks OverrideLevel from
SessionContext: ALL bypasses all enforcement, RELAX grants tools but
preserves lineage checks, NONE/UNSPECIFIED applies full manifest
enforcement. Returns typed SessionOverrideResult enum for downstream
layers. 8 tests covering all override levels and edge cases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 16:02:33 +01:00
11d7bab132 Merge pull request 'feat: implement Agent Type Manifest loader (#52)' (#150) from feature/issue-52-manifest-loader into main 2026-03-10 15:59:59 +01:00
Pi Agent
c5ceb98a92 feat: implement Agent Type Manifest loader (issue #52)
Add ManifestStore that loads TOML agent type manifests from a directory.
Each manifest defines allowed tools, path allowlists, network egress
policies, lineage constraints (can_spawn), and max spawn depth.
Includes validation, reload support, and lookup by ID or name.

14 manifest tests + 8 existing = 22 total, clippy clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 15:59:34 +01:00
b9064bfe98 Merge pull request 'feat: scaffold Tool Broker Rust project (#51)' (#149) from feature/issue-51-scaffold-tool-broker into main 2026-03-10 15:56:21 +01:00
Pi Agent
09b516ec3e feat: scaffold Tool Broker Rust project (issue #51)
Create the Tool Broker service skeleton as a Cargo workspace member:
- Tonic gRPC server with DiscoverTools, ExecuteTool, ValidateCall stubs
- TOML config loading (host, port, manifest_dir, audit/secrets addrs)
- Server-streaming support for ExecuteTool via ReceiverStream
- 8 tests (5 config, 3 service stub) passing, clippy clean

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 15:55:53 +01:00
986584b759 Merge pull request 'test: integration tests for Search Service (#50)' (#148) from feature/issue-50-search-integration-tests into main 2026-03-10 15:51:33 +01:00
Pi Agent
cd75318f45 test: add integration tests for Search Service (issue #50)
8 integration tests wiring real service components with mocked external
services (SearXNG via aioresponses, Model Gateway/Audit via mock gRPC
servers). Tests cover: full pipeline with all fields populated, clean
text extraction, summarization, unreachable URL handling, audit logging,
SearXNG unavailability, result ordering, and Model Gateway fallback.

Total: 71 tests passing across the Search Service.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 15:51:13 +01:00
2a16c98597 Merge pull request 'feat: implement Search gRPC endpoint (#49)' (#147) from feature/issue-49-search-endpoint into main 2026-03-10 15:48:30 +01:00
Pi Agent
6ecc8b8f38 feat: implement Search gRPC endpoint with full pipeline (issue #49)
Wire the Search RPC handler to orchestrate the full search pipeline:
SearXNG query → content extraction → Model Gateway summarization.
Supports configurable pipeline stages (extraction/summarization can
be disabled), audit logging via Audit Service, and graceful degradation
at each stage. 14 tests covering full pipeline, partial pipelines,
validation, error handling, and audit logging.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 15:48:11 +01:00