Merge pull request 'docs: define researcher agent specification (#68)' (#166) from feature/issue-68-researcher-agent-spec into main

2026-03-11 06:04:03 +01:00
parent 9ae0e6a8b2 77a5aa6ff0
commit 43cbb2a3b7
2 changed files with 263 additions and 0 deletions
--- a/implementation-plans/issue-068.md
+++ b/implementation-plans/issue-068.md
@@ -0,0 +1,46 @@
+# Implementation Plan — Issue #68: Define researcher agent system prompt and context structure
+
+## Metadata
+
+| Field | Value |
+|---|---|
+| Issue | [#68](https://git.shahondin1624.de/llm-multiverse/llm-multiverse/issues/68) |
+| Title | Define researcher agent system prompt and context structure |
+| Milestone | Phase 8: First Subagent (Researcher) |
+| Labels | — |
+| Status | `COMPLETED` |
+| Language | Specification |
+| Related Plans | issue-015.md |
+| Blocked by | #15 |
+
+## Acceptance Criteria
+
+- [x] System prompt defining researcher role, capabilities, and constraints
+- [x] Context structure: system prompt + task description + tool results + scratchpad
+- [x] Tool use conventions: how to format tool calls, handle results
+- [x] Confidence signaling format (how researcher reports certainty)
+- [x] Return schema matching orchestrator.proto `SubagentResult`
+- [x] Document the researcher agent specification
+
+## Implementation Steps
+
+### 1. Researcher Agent Specification (`specs/researcher-agent.md`)
+- System prompt with role definition, tool use format, confidence signaling, completion format
+- Context window structure (6 sections with compaction rules)
+- Token budget allocation (~4096 total)
+- Tool use conventions (JSON tool call format, result injection with provenance headers)
+- Confidence signaling mapping to `ResultQuality` proto enum
+- Return schema mapping agent JSON output to `SubagentResult` proto message
+- Termination conditions (5 exit paths)
+- Agent lineage structure for Tool Broker enforcement
+- Example flow showing complete research task
+
+## Files to Create/Modify
+
+| File | Action | Purpose |
+|---|---|---|
+| `specs/researcher-agent.md` | Create | Complete researcher agent specification |
+
+## Deviation Log
+
+_(No deviations)_
--- a/specs/researcher-agent.md
+++ b/specs/researcher-agent.md
@@ -0,0 +1,217 @@
+# Researcher Agent Specification
+
+## Overview
+
+The researcher agent is the first subagent type in the LLM Multiverse system. It specializes in information gathering using web search and memory retrieval tools. The orchestrator dispatches researcher agents to answer factual questions, gather data, and compile research findings.
+
+**Agent Type**: `AGENT_TYPE_RESEARCHER` (id: 2)
+**Allowed Tools**: `web_search`, `memory_read`
+**Language**: Python (part of the orchestrator service)
+
+## System Prompt
+
+```
+You are a Researcher agent in a multi-agent system. Your role is to gather accurate information to answer questions and complete research tasks.
+
+## Capabilities
+- Web search via the `web_search` tool
+- Memory retrieval via the `memory_read` tool
+
+## Instructions
+1. Analyze the task to identify what information is needed.
+2. Search for relevant information using your available tools.
+3. Cross-reference findings from multiple sources when possible.
+4. Clearly distinguish between verified facts and inferences.
+5. When you have sufficient information, produce your findings.
+
+## Tool Use Format
+To use a tool, respond with a JSON tool call block:
+```json
+{"tool": "<tool_name>", "parameters": {"<key>": "<value>"}}
+```
+
+After receiving a tool result, analyze it and decide whether to:
+- Make another tool call for more information
+- Produce your final findings
+
+## Confidence Signaling
+Rate your confidence in each finding:
+- VERIFIED: Information confirmed by tool output from reliable sources
+- INFERRED: Reasonable conclusion drawn from available evidence
+- UNCERTAIN: Best guess with limited supporting evidence
+
+## Completion
+When done, respond with a JSON result block:
+```json
+{"done": true, "summary": "<3 sentences max>", "findings": ["<finding1>", ...], "confidence": "VERIFIED|INFERRED|UNCERTAIN", "memory_candidates": [{"content": "<fact>", "confidence": 0.0-1.0}]}
+```
+
+## Constraints
+- Do NOT fabricate information. If you cannot find an answer, say so.
+- Do NOT attempt to use tools not listed in your capabilities.
+- Limit tool calls to a maximum of 10 per task.
+- Keep your final summary to 3 sentences or fewer.
+```
+
+## Context Window Structure
+
+The researcher agent's context is built in this order:
+
+```
+--------------------------------------------------+
+| 1. SYSTEM PROMPT                                  |
+|    (Fixed, never compacted)                       |
+--------------------------------------------------+
+| 2. TASK DESCRIPTION                               |
+|    From SubagentRequest.task                      |
+|    (Fixed, never compacted)                       |
+--------------------------------------------------+
+| 3. MEMORY CONTEXT (optional)                      |
+|    From SubagentRequest.relevant_memory_context   |
+|    Pre-fetched by orchestrator                    |
+|    (Compactable after first iteration)            |
+--------------------------------------------------+
+| 4. TOOL RESULTS (accumulated)                     |
+|    Each entry:                                    |
+|      [TOOL_CALL: web_search(query="...")]         |
+|      [TOOL_RESULT: EXTERNAL | tool=web_search]   |
+|      <result content>                             |
+|    (Older entries compacted to summaries)          |
+--------------------------------------------------+
+| 5. AGENT REASONING (accumulated)                  |
+|    The agent's own thoughts and analysis          |
+|    (Older entries compacted to summaries)          |
+--------------------------------------------------+
+| 6. SCRATCHPAD (current iteration)                 |
+|    Current tool call or final result              |
+|    (Never compacted)                              |
+--------------------------------------------------+
+```
+
+### Token Budget
+
+| Section | Allocation | Compactable |
+|---------|-----------|-------------|
+| System prompt | ~400 tokens | No |
+| Task description | ~200 tokens | No |
+| Memory context | ~500 tokens | Yes |
+| Tool results | ~2000 tokens | Yes (older entries) |
+| Agent reasoning | ~500 tokens | Yes (older entries) |
+| Scratchpad | ~400 tokens | No |
+| **Total budget** | **~4000 tokens** | |
+
+The `max_tokens` field in `SubagentRequest` controls the total context budget. Default: 4096.
+
+## Tool Use Conventions
+
+### Tool Call Format
+
+The agent outputs a JSON block to request a tool call:
+
+```json
+{"tool": "web_search", "parameters": {"query": "rust async runtime comparison"}}
+```
+
+The orchestrator parses this, calls `ExecuteTool` on the Tool Broker with:
+- `context`: The session's `SessionContext` with proper `AgentLineage`
+- `agent_type`: `AGENT_TYPE_RESEARCHER` (2)
+- `tool_name`: From the JSON `tool` field
+- `parameters`: From the JSON `parameters` field
+
+### Tool Result Format
+
+Tool results are injected back into the context with provenance headers from the Tool Broker's result tagger:
+
+```
+[TOOL_RESULT: EXTERNAL | tool=web_search | agent=res-abc123 | session=sess-1 | success=true]
+Search results for "rust async runtime comparison":
+1. Tokio is the most widely used async runtime...
+2. async-std provides a simpler API...
+```
+
+### Error Handling
+
+If a tool call fails, the error is injected as:
+
+```
+[TOOL_RESULT: EXTERNAL | tool=web_search | agent=res-abc123 | session=sess-1 | success=false]
+Error: connection timeout
+```
+
+The agent should acknowledge the failure and either retry with different parameters or proceed with available information.
+
+## Confidence Signaling
+
+The researcher reports confidence using the `ResultQuality` enum:
+
+| Agent Signal | Proto Enum | Meaning |
+|-------------|-----------|---------|
+| `"VERIFIED"` | `RESULT_QUALITY_VERIFIED` | Confirmed by tool output |
+| `"INFERRED"` | `RESULT_QUALITY_INFERRED` | Reasonable conclusion |
+| `"UNCERTAIN"` | `RESULT_QUALITY_UNCERTAIN` | Limited evidence |
+
+## Return Schema
+
+The agent's final output maps to `SubagentResult`:
+
+| Agent JSON Field | Proto Field | Type |
+|-----------------|------------|------|
+| `summary` | `summary` | string (3 sentences max) |
+| `findings` | `artifacts` | repeated string |
+| `confidence` | `result_quality` | ResultQuality enum |
+| `memory_candidates` | `new_memory_candidates` | repeated MemoryCandidate |
+| (implicit) | `status` | ResultStatus (SUCCESS/PARTIAL/FAILED) |
+| (implicit) | `source` | ResultSource (WEB/TOOL_OUTPUT) |
+| (on error) | `failure_reason` | optional string |
+
+### Mapping Rules
+
+- If `done: true` with findings: `status = SUCCESS`
+- If `done: true` with partial findings and `confidence = "UNCERTAIN"`: `status = PARTIAL`
+- If the agent hits max iterations without completing: `status = PARTIAL`
+- If all tool calls fail and no findings: `status = FAILED`
+- `source` is `RESULT_SOURCE_WEB` if any web_search was used, otherwise `RESULT_SOURCE_MODEL_KNOWLEDGE`
+- `memory_candidates` are facts the orchestrator should consider persisting via `WriteMemory`
+
+## Termination Conditions
+
+The researcher agent loop terminates when:
+
+1. **Explicit done**: Agent outputs `{"done": true, ...}` — normal completion
+2. **Max iterations**: Exceeds 10 tool calls — forced partial result
+3. **Timeout**: Exceeds the configured timeout (default: 120s) — forced partial result
+4. **Context overflow**: Cannot fit another iteration — triggers compaction or terminates
+5. **All tools failed**: No usable tool results after 3 consecutive failures — report failure
+
+## Agent Lineage
+
+When the orchestrator dispatches a researcher, the `AgentLineage` is extended:
+
+```
+agents: [
+  {agent_id: "orch-main", agent_type: 1, spawn_depth: 0},
+  {agent_id: "res-<uuid>", agent_type: 2, spawn_depth: 1}
+]
+```
+
+This lineage is passed in every `ExecuteTool` call so the Tool Broker can verify:
+- The orchestrator is allowed to spawn researchers (`can_spawn: ["researcher"]`)
+- The spawn depth (1) is within limits (`max_spawn_depth: 3`)
+
+## Example Flow
+
+```
+Orchestrator → SubagentRequest(task="What are the main async runtimes in Rust?")
+
+Researcher:
+  1. Parse task
+  2. Call: {"tool": "memory_read", "parameters": {"query": "rust async runtimes"}}
+     → Result: No relevant memories found
+  3. Call: {"tool": "web_search", "parameters": {"query": "rust async runtime comparison 2025"}}
+     → Result: [article summaries about Tokio, async-std, smol]
+  4. Call: {"tool": "web_search", "parameters": {"query": "tokio vs async-std benchmark"}}
+     → Result: [benchmark data]
+  5. Return: {"done": true, "summary": "The main Rust async runtimes are Tokio, async-std, and smol. Tokio dominates in production usage. async-std offers simpler APIs while smol focuses on minimalism.", "findings": ["Tokio: most popular, used by Axum/Hyper", "async-std: simpler API, compatible interface", "smol: minimal footprint, composable"], "confidence": "VERIFIED", "memory_candidates": [{"content": "Main Rust async runtimes: Tokio (dominant), async-std (simple), smol (minimal)", "confidence": 0.95}]}
+
+→ SubagentResult(status=SUCCESS, summary="...", artifacts=[...], result_quality=VERIFIED, source=WEB)
+```