# Agent Orchestration Implementation Plan

**Version:** 1.8
**Date:** January 15, 2026
**Status:** Phase 8 Complete - Production Hardening ✅ ALL PHASES COMPLETE

### Implementation Progress

| Phase | Status | Completion Date |
|-------|--------|-----------------|
| Phase 1: Foundation | ✅ Complete | Jan 2026 |
| Phase 2: CLI Integration | ✅ Complete | Jan 2026 |
| Phase 3: Memory Layer | ✅ Complete | Jan 2026 |
| Phase 4: Control Loop + Risk Gate | ✅ Complete | Jan 2026 |
| Phase 5: Human Interrupt + Approvals | ✅ Complete | Jan 2026 |
| Phase 6: Budget Controls | ✅ Complete | Jan 2026 |
| Phase 7: Merge Gate + Observability | ✅ Complete | Jan 2026 |
| Phase 8: Production Hardening | ✅ Complete | Jan 2026 |

---

## Table of Contents

1. [Executive Summary](#executive-summary)
2. [Project Documentation Structure](#project-documentation-structure)
3. [Codebase File Structure](#codebase-file-structure)
4. [Phase Dependency Graph](#phase-dependency-graph)
5. [Phase 1: Foundation](#phase-1-foundation)
6. [Phase 2: CLI Integration](#phase-2-cli-integration)
7. [Phase 3: Memory Layer](#phase-3-memory-layer)
8. [Phase 4: Control Loop + Risk Gate](#phase-4-control-loop--risk-gate)
9. [Phase 5: Human Interrupt + Approvals](#phase-5-human-interrupt--approvals)
10. [Phase 6: Budget Controls](#phase-6-budget-controls)
11. [Phase 7: Merge Gate + Observability](#phase-7-merge-gate--observability)
12. [Phase 8: Production Hardening](#phase-8-production-hardening)
13. [Risk Register](#risk-register)
14. [Definition of Done (Global)](#definition-of-done-global)

---

## Executive Summary

This plan implements a multi-agent orchestration system that coordinates CLI-based coding agents (Claude Code, Gemini CLI, Codex CLI) and API-based agents (Claude Agent SDK, OpenAI Agents SDK) through a unified Python orchestrator.

### Core Deliverables

| Deliverable | Description |
|-------------|-------------|
| **Adapter Layer** | Unified interface for all agent types (CLI + API) |
| **Persistence Layer** | SQLite-backed state management with recovery |
| **Project Journal** | Context handoff protocol between agents |
| **Memory Layer** | Two-tier memory (Operational + Knowledge) with Write Gate |
| **Control Loop** | Health monitoring, stuck detection, auto-prompting |
| **Risk Gate** | Four-tier autonomy (LOW/MEDIUM/HIGH/CRITICAL) |
| **Human Interrupt** | CLI blocking (V1) + Webhook/async (V2) |
| **Budget Controls** | Per-agent, per-MCP-server, per-tool limits |
| **Merge Gate** | Protected branch controls with readiness checks |
| **Observability** | Token Audit + AI Observer integration |

---

## Project Documentation Structure

### Required Documentation Files

```
Agent_Orchestration/
├── README.md                    # Project overview, quick start
├── docs/
│   ├── RESEARCH.md              # Technical research (exists)
│   ├── IMPLEMENTATION_PLAN.md   # This document
│   ├── ARCHITECTURE.md          # System architecture deep dive
│   ├── ADAPTER_REFERENCE.md     # How to implement new adapters
│   ├── CONFIGURATION.md         # Environment variables, settings
│   ├── RISK_POLICY.md           # Risk classification rules
│   ├── OPERATIONS.md            # Runbooks, troubleshooting
│   └── API.md                   # Internal API documentation
└── examples/
    ├── basic_orchestration.py   # Minimal working example
    ├── multi_agent_workflow.py  # Claude + Gemini + Codex
    └── custom_adapter.py        # How to add new agents
```

### Documentation Creation Schedule

| Document | Created In | Required For |
|----------|------------|--------------|
| README.md | Phase 1 | Project initialization |
| ARCHITECTURE.md | Phase 1 | Team onboarding |
| ADAPTER_REFERENCE.md | Phase 1 | CLI adapter work |
| CONFIGURATION.md | Phase 1 | Environment setup |
| RISK_POLICY.md | Phase 3 | Risk gate implementation |
| OPERATIONS.md | Phase 7 | Production deployment |
| API.md | Phase 7 | External integrations |
| examples/*.py | Each phase | Developer guidance |

---

## Codebase File Structure

```
Agent_Orchestration/
├── src/
│   └── agent_orchestrator/
│       ├── __init__.py
│       ├── main.py                      # Entry point
│       ├── config.py                    # Configuration loading
│       │
│       ├── adapters/                    # Adapter Layer
│       │   ├── __init__.py
│       │   ├── base.py                  # BaseAdapter, LLMAdapter, CLIAgentAdapter
│       │   ├── claude_sdk.py            # Claude Agent SDK adapter
│       │   ├── openai_agents.py         # OpenAI Agents SDK adapter
│       │   ├── claude_code_cli.py       # Claude Code CLI adapter
│       │   ├── gemini_cli.py            # Gemini CLI adapter
│       │   └── codex_cli.py             # Codex CLI adapter
│       │
│       ├── persistence/                 # Persistence Layer
│       │   ├── __init__.py
│       │   ├── database.py              # OrchestratorDB class
│       │   ├── schema.sql               # SQLite schema
│       │   └── models.py                # Dataclasses for DB entities
│       │
│       ├── journal/                     # Project Journal Protocol
│       │   ├── __init__.py
│       │   ├── project_journal.py       # ProjectJournal class
│       │   └── status_packet.py         # StatusPacket, TaskArtifacts
│       │
│       ├── memory/                      # Memory Layer
│       │   ├── __init__.py
│       │   ├── models.py                # MemoryItem, MemoryPatch, MemoryTier
│       │   ├── operational.py           # OperationalMemory (SQLite + files)
│       │   ├── knowledge.py             # KnowledgeMemory (RAG retrieval)
│       │   ├── working.py               # WorkingMemory (per-task scratch)
│       │   ├── write_gate.py            # MemoryWriteGate (validation)
│       │   ├── librarian.py             # MemoryLibrarian agent
│       │   └── maintenance.py           # Scheduled cleanup jobs
│       │
│       ├── control/                     # Control Loop
│       │   ├── __init__.py
│       │   ├── loop.py                  # AgentControlLoop
│       │   ├── health.py                # AgentHealthCheck, stuck detection
│       │   └── actions.py               # ControlAction enum, action execution
│       │
│       ├── risk/                        # Risk Gate
│       │   ├── __init__.py
│       │   ├── policy.py                # RiskPolicy, RiskLevel
│       │   ├── blocklist.py             # CRITICAL_BLOCKLIST patterns
│       │   └── autonomy_gate.py         # AutonomyGate class
│       │
│       ├── interrupt/                   # Human Interrupt Interface
│       │   ├── __init__.py
│       │   ├── cli_handler.py           # V1: CLIInterruptHandler
│       │   └── async_handler.py         # V2: AsyncInterruptHandler
│       │
│       ├── budget/                      # Budget Controls
│       │   ├── __init__.py
│       │   ├── agent_budget.py          # AgentBudget, enforcement
│       │   ├── mcp_budget.py            # MCPServerBudget, ToolBudget
│       │   └── registry.py              # MCPRegistry allowlist
│       │
│       ├── merge/                       # Merge Gate
│       │   ├── __init__.py
│       │   ├── readiness.py             # MergeReadiness checks
│       │   └── gate.py                  # MergeGate class
│       │
│       ├── workspace/                   # DIY Workspace Manager
│       │   ├── __init__.py
│       │   └── manager.py               # DIYAgentWorkspace (tmux+worktrees)
│       │
│       ├── routing/                     # Task Router
│       │   ├── __init__.py
│       │   ├── router.py                # TaskRouter
│       │   └── task_types.py            # TaskType enum
│       │
│       ├── secrets/                     # Secret Handling
│       │   ├── __init__.py
│       │   ├── redactor.py              # SecretRedactor
│       │   └── guard.py                 # SecretFileGuard
│       │
│       ├── observability/               # Observability Integration
│       │   ├── __init__.py
│       │   ├── token_audit.py           # Token Audit integration
│       │   ├── ai_observer.py           # AI Observer integration
│       │   ├── alerts.py                # Alert Manager
│       │   └── audit.py                 # Audit Trail system
│       │
│       └── reliability/                 # Production Hardening
│           ├── __init__.py
│           ├── rate_limiter.py          # Rate limiting with backoff
│           ├── retry.py                 # Retry logic, circuit breaker
│           └── shutdown.py              # Graceful shutdown handling
│
├── ops/                                 # Operational Memory (human-readable)
│   ├── project_state.json               # Current project state (authoritative)
│   ├── decisions/                       # Architecture Decision Records
│   │   ├── ADR-0001-memory-architecture.md
│   │   └── ADR-0002-risk-gate-levels.md
│   ├── runbooks/                        # Operational runbooks
│   │   ├── agent-recovery.md
│   │   └── workspace-setup.md
│   ├── patterns/                        # Known patterns and fixes
│   │   └── error-fixes.md
│   └── prompts/                         # Agent-specific prompts
│       ├── claude-code.md
│       ├── gemini-docs.md
│       └── codex-ci.md
│
├── tests/
│   ├── __init__.py
│   ├── conftest.py                      # Pytest fixtures
│   ├── unit/
│   │   ├── test_adapters.py
│   │   ├── test_persistence.py
│   │   ├── test_journal.py
│   │   ├── test_risk_policy.py
│   │   ├── test_health_check.py
│   │   └── test_secret_redactor.py
│   ├── integration/
│   │   ├── test_control_loop.py
│   │   ├── test_autonomy_gate.py
│   │   └── test_merge_gate.py
│   └── e2e/
│       └── test_multi_agent_workflow.py
│
├── scripts/
│   ├── setup_dev.sh                     # Development environment setup
│   ├── create_worktrees.sh              # Git worktree initialization
│   └── run_orchestrator.py              # CLI entry point
│
├── data/
│   └── .gitkeep                         # SQLite DB location
│
├── project_state.json                   # Project Journal (generated)
├── agent_journal.md                     # Agent log (generated)
│
├── pyproject.toml                       # Project configuration
├── requirements.txt                     # Dependencies
├── requirements-dev.txt                 # Dev dependencies
├── .env.example                         # Environment template
└── .gitignore
```

---

## Phase Dependency Graph

```
Phase 1: Foundation ─────────────────────────────────────────────────┐
    │                                                                │
    ├── Adapter interfaces (BaseAdapter, LLMAdapter, CLIAgentAdapter)│
    ├── SQLite schema + OrchestratorDB                               │
    ├── ProjectJournal + StatusPacket                                │
    ├── SecretRedactor                                               │
    └── Config loading + .env                                        │
         │                                                           │
         ▼                                                           │
Phase 2: CLI Integration ────────────────────────────────────────────┤
    │                                                                │
    ├── ClaudeCodeCLIAdapter                                         │
    ├── GeminiCLIAdapter                                             │
    ├── CodexCLIAdapter                                              │
    ├── DIYAgentWorkspace (tmux + worktrees)                         │
    ├── TaskRouter + TaskType classification                         │
    └── State injection/capture for headless mode                    │
         │                                                           │
         ▼                                                           │
Phase 3: Memory Layer ───────────────────────────────────────────────┤
    │                                                                │
    ├── OperationalMemory (SQLite + /ops/ files)                     │
    ├── KnowledgeMemory (RAG index over /ops/)                       │
    ├── WorkingMemory (per-task scratch)                             │
    ├── MemoryWriteGate (validation + approval)                      │
    ├── MemoryLibrarian agent (cleanup, dedup, promote)              │
    └── Maintenance scheduler (daily/weekly jobs)                    │
         │                                                           │
         ▼                                                           │
Phase 4: Control Loop + Risk Gate ───────────────────────────────────┤
    │                                                                │
    ├── AgentHealthCheck + stuck heuristics                          │
    ├── AgentControlLoop (uses Memory for context)                   │
    ├── RiskPolicy + RiskLevel                                       │
    ├── CRITICAL_BLOCKLIST                                           │
    └── AutonomyGate                                                 │
         │                                                           │
         ▼                                                           │
Phase 5: Human Interrupt + Approvals ────────────────────────────────┤
    │                                                                │
    ├── CLIInterruptHandler (V1)                                     │
    ├── AsyncInterruptHandler (V2)                                   │
    └── Approval queue + timeout                                     │
         │                                                           │
         ▼                                                           │
Phase 6: Budget Controls ────────────────────────────────────────────┤
    │                                                                │
    ├── AgentBudget enforcement                                      │
    ├── MCPServerBudget + ToolBudget                                 │
    ├── MCPRegistry allowlist                                        │
    └── Token Audit MCP integration                                  │
         │                                                           │
         ▼                                                           │
Phase 7: Merge Gate + Observability ─────────────────────────────────┤
    │                                                                │
    ├── MergeReadiness checks                                        │
    ├── MergeGate with lock                                          │
    ├── AI Observer integration                                      │
    └── Alerting (stuck agents, cost thresholds)                     │
         │                                                           │
         ▼                                                           │
Phase 8: Production Hardening ───────────────────────────────────────┘
    │
    ├── Rate limiting + backoff
    ├── Fallback models
    ├── Complete audit trail
    ├── Load testing
    └── Operations documentation
```

---

## Phase 1: Foundation

**Goal:** Establish core abstractions, persistence, and project journal protocol.

**Dependencies:** None (starting phase)

### Tasks

#### 1.1 Project Initialization

| Task | Description | Output Files |
|------|-------------|--------------|
| 1.1.1 | Create project structure with `pyproject.toml` | `pyproject.toml`, folder structure |
| 1.1.2 | Set up dependencies in requirements files | `requirements.txt`, `requirements-dev.txt` |
| 1.1.3 | Create `.env.example` with all required variables | `.env.example` |
| 1.1.4 | Implement `config.py` for environment loading | `src/.../config.py` |
| 1.1.5 | Write README.md with quick start | `README.md` |

#### 1.2 Persistence Layer

| Task | Description | Output Files |
|------|-------------|--------------|
| 1.2.1 | Write SQLite schema (7 tables) | `src/.../persistence/schema.sql` |
| 1.2.2 | Implement `OrchestratorDB` class | `src/.../persistence/database.py` |
| 1.2.3 | Create dataclass models for DB entities | `src/.../persistence/models.py` |
| 1.2.4 | Write unit tests for persistence | `tests/unit/test_persistence.py` |

**SQLite Tables:**
- `agents` - Agent registry
- `tasks` - Task queue
- `runs` - Execution history
- `health_samples` - Health metrics
- `approvals` - Approval queue
- `usage_daily` - Daily usage aggregates
- `mcp_tool_usage` - MCP tool tracking

#### 1.3 Adapter Layer Interface

| Task | Description | Output Files |
|------|-------------|--------------|
| 1.3.1 | Define `RiskLevel` enum | `src/.../adapters/base.py` |
| 1.3.2 | Define `TaskArtifacts` dataclass | `src/.../journal/status_packet.py` |
| 1.3.3 | Define `StatusPacket` dataclass | `src/.../journal/status_packet.py` |
| 1.3.4 | Define `AgentResponse` dataclass | `src/.../adapters/base.py` |
| 1.3.5 | Define `UsageStats` dataclass | `src/.../adapters/base.py` |
| 1.3.6 | Implement `BaseAdapter` ABC | `src/.../adapters/base.py` |
| 1.3.7 | Implement `LLMAdapter` base class | `src/.../adapters/base.py` |
| 1.3.8 | Implement `CLIAgentAdapter` base class | `src/.../adapters/base.py` |
| 1.3.9 | Write unit tests for adapter interfaces | `tests/unit/test_adapters.py` |

#### 1.4 API Adapters

| Task | Description | Output Files |
|------|-------------|--------------|
| 1.4.1 | Implement `ClaudeSDKAdapter` | `src/.../adapters/claude_sdk.py` |
| 1.4.2 | Implement `OpenAIAgentsAdapter` | `src/.../adapters/openai_agents.py` |
| 1.4.3 | Write integration tests for SDK adapters | `tests/integration/test_sdk_adapters.py` |

#### 1.5 Project Journal Protocol

| Task | Description | Output Files |
|------|-------------|--------------|
| 1.5.1 | Define `project_state.json` schema | `docs/ARCHITECTURE.md` (schema section) |
| 1.5.2 | Implement `ProjectJournal` class | `src/.../journal/project_journal.py` |
| 1.5.3 | Implement `read_state()` method | `src/.../journal/project_journal.py` |
| 1.5.4 | Implement `write_status()` method | `src/.../journal/project_journal.py` |
| 1.5.5 | Implement `_append_journal()` method | `src/.../journal/project_journal.py` |
| 1.5.6 | Implement `record_decision()` method | `src/.../journal/project_journal.py` |
| 1.5.7 | Write unit tests for journal | `tests/unit/test_journal.py` |

#### 1.6 Secret Handling

| Task | Description | Output Files |
|------|-------------|--------------|
| 1.6.1 | Implement `SecretRedactor` class | `src/.../secrets/redactor.py` |
| 1.6.2 | Define `REDACTION_PATTERNS` | `src/.../secrets/redactor.py` |
| 1.6.3 | Implement `wrap_logger()` decorator | `src/.../secrets/redactor.py` |
| 1.6.4 | Implement `SecretFileGuard` class | `src/.../secrets/guard.py` |
| 1.6.5 | Define `NEVER_STORE_PATTERNS` | `src/.../secrets/guard.py` |
| 1.6.6 | Write unit tests for secret handling | `tests/unit/test_secret_redactor.py` |

### Phase 1 Acceptance Criteria

- [x] Project can be installed via `pip install -e .`
- [x] SQLite database initializes with all 7 tables on first run
- [ ] `BaseAdapter.execute()` contract is testable with mock
- [ ] `ProjectJournal.read_state()` returns valid JSON structure
- [ ] `ProjectJournal.write_status()` updates both JSON and markdown
- [ ] `SecretRedactor.redact()` removes all defined patterns
- [x] All unit tests pass (`pytest tests/unit/`)
- [ ] `ClaudeSDKAdapter` can execute a simple task (manual test)
- [ ] `OpenAIAgentsAdapter` can execute a simple task (manual test)

### Phase 1 Deliverables

| Deliverable | Verification |
|-------------|--------------|
| Working adapter interfaces | Unit tests pass |
| SQLite persistence | Database creates on startup |
| Project journal | JSON + markdown files created |
| Secret redaction | Unit tests for all patterns |
| Documentation | README.md, ARCHITECTURE.md |

---

## Phase 2: CLI Integration

**Goal:** Integrate CLI-based agents (Claude Code, Gemini CLI) with workspace isolation.

**Dependencies:** Phase 1 complete

### Tasks

#### 2.1 Claude Code CLI Adapter

| Task | Description | Output Files |
|------|-------------|--------------|
| 2.1.1 | Implement `ClaudeCodeCLIAdapter` class | `src/.../adapters/claude_code_cli.py` |
| 2.1.2 | Implement `execute()` with `--output-format json` | `src/.../adapters/claude_code_cli.py` |
| 2.1.3 | Implement state injection into prompt | `src/.../adapters/claude_code_cli.py` |
| 2.1.4 | Implement usage extraction from JSON response | `src/.../adapters/claude_code_cli.py` |
| 2.1.5 | Implement `stream()` with `stream-json` | `src/.../adapters/claude_code_cli.py` |
| 2.1.6 | Handle headless mode non-persistence | `src/.../adapters/claude_code_cli.py` |
| 2.1.7 | Write integration tests | `tests/integration/test_claude_code_cli.py` |

**Critical Implementation Note:**
```python
# Each invocation is a fresh session - state MUST be injected
def execute(self, task: str, context: dict) -> AgentResponse:
    state = self.journal.read_state()  # Always read current state
    full_prompt = self._inject_state(task, state)  # Inject into prompt
    # ... run claude -p full_prompt --output-format json
    # ... capture StatusPacket from response
```

#### 2.2 Gemini CLI Adapter

| Task | Description | Output Files |
|------|-------------|--------------|
| 2.2.1 | Implement `GeminiCLIAdapter` class | `src/.../adapters/gemini_cli.py` |
| 2.2.2 | Implement `execute()` method | `src/.../adapters/gemini_cli.py` |
| 2.2.3 | Implement MCP server configuration | `src/.../adapters/gemini_cli.py` |
| 2.2.4 | Handle 1M token context window | `src/.../adapters/gemini_cli.py` |
| 2.2.5 | Implement usage tracking | `src/.../adapters/gemini_cli.py` |
| 2.2.6 | Write integration tests | `tests/integration/test_gemini_cli.py` |

#### 2.3 DIY Workspace Manager

| Task | Description | Output Files |
|------|-------------|--------------|
| 2.3.1 | Implement `DIYAgentWorkspace` class | `src/.../workspace/manager.py` |
| 2.3.2 | Implement `create_workspace()` (worktree + tmux) | `src/.../workspace/manager.py` |
| 2.3.3 | Implement `run_in_workspace()` | `src/.../workspace/manager.py` |
| 2.3.4 | Implement `switch_to()` | `src/.../workspace/manager.py` |
| 2.3.5 | Implement `cleanup()` | `src/.../workspace/manager.py` |
| 2.3.6 | Implement `list_workspaces()` | `src/.../workspace/manager.py` |
| 2.3.7 | Write shell script for initial setup | `scripts/create_worktrees.sh` |
| 2.3.8 | Write integration tests | `tests/integration/test_workspace.py` |

#### 2.4 Task Router (Initial)

| Task | Description | Output Files |
|------|-------------|--------------|
| 2.4.1 | Define `TaskType` enum | `src/.../routing/task_types.py` |
| 2.4.2 | Define `TASK_ROUTING` table | `src/.../routing/router.py` |
| 2.4.3 | Implement `TaskRouter` class | `src/.../routing/router.py` |
| 2.4.4 | Implement `route()` method | `src/.../routing/router.py` |
| 2.4.5 | Implement `_has_quota()` check | `src/.../routing/router.py` |
| 2.4.6 | Write unit tests | `tests/unit/test_router.py` |

### Phase 2 Acceptance Criteria

- [ ] `ClaudeCodeCLIAdapter.execute()` runs Claude Code headlessly
- [ ] State is properly injected into Claude Code prompts
- [ ] `GeminiCLIAdapter.execute()` runs Gemini CLI
- [ ] Git worktrees created successfully per agent
- [ ] tmux sessions created and agent runs in correct workspace
- [ ] `TaskRouter.route()` returns correct agent based on task type
- [ ] All CLI adapters produce `StatusPacket` at end of run
- [ ] Integration tests pass for both CLI adapters

### Phase 2 Deliverables

| Deliverable | Verification |
|-------------|--------------|
| Claude Code integration | Manual test: run multi-file refactor |
| Gemini CLI integration | Manual test: run large context analysis |
| Workspace isolation | Each agent in separate worktree/tmux |
| Task routing | Unit tests for routing logic |

---

## Phase 3: Memory Layer

**Goal:** Implement two-tier memory architecture with controlled write access and automated maintenance.

**Dependencies:** Phase 2 complete

### Memory Architecture Overview

The memory system uses three layers to prevent drift, bad assumptions, and security leaks:

| Layer | Purpose | Storage | Write Access |
|-------|---------|---------|--------------|
| **Operational** | System consistency across runs | SQLite + `/ops/` files | Through Write Gate |
| **Knowledge** | Agent "learning" from past work | RAG index over `/ops/` | Read-only (agents propose) |
| **Working** | Per-task scratch space | In-memory + temp files | Agent-writable, discarded |

### Memory Hierarchy Tiers

| Tier | Name | Contents | Retention |
|------|------|----------|-----------|
| 0 | Immutable Audit | Task runs, logs, approvals | Never deleted |
| 1 | Authoritative | ADRs, policies, project_state.json | Changes require approval |
| 2 | Working Knowledge | Runbooks, prompts, patterns | Librarian can modify |
| 3 | Ephemeral Cache | Raw tool outputs, scratch | Auto-expire after summarization |

### Tasks

#### 3.1 Memory Models

| Task | Description | Output Files |
|------|-------------|--------------|
| 3.1.1 | Define `MemoryTier` enum | `src/.../memory/models.py` |
| 3.1.2 | Define `MemoryItemType` enum | `src/.../memory/models.py` |
| 3.1.3 | Define `MemoryItem` dataclass | `src/.../memory/models.py` |
| 3.1.4 | Define `MemoryPatch` dataclass | `src/.../memory/models.py` |
| 3.1.5 | Define `PatchOperation` enum (ADD/UPDATE/DELETE/TAG) | `src/.../memory/models.py` |
| 3.1.6 | Add `memory_items` table to SQLite schema | `src/.../persistence/schema.sql` |
| 3.1.7 | Write unit tests for memory models | `tests/unit/test_memory_models.py` |

**MemoryItem Schema:**
```python
@dataclass
class MemoryItem:
    id: str
    tier: MemoryTier
    item_type: MemoryItemType  # runbook, adr, pattern, prompt, issue_fix
    title: str
    content: str
    tags: list[str]
    source_links: list[str]  # commits, tickets, logs
    created_by: str  # agent_id or "human"
    created_at: datetime
    updated_at: datetime
    superseded_by: Optional[str]  # ID of newer item
    embedding_ref: Optional[str]  # for RAG index
```

#### 3.2 Operational Memory

| Task | Description | Output Files |
|------|-------------|--------------|
| 3.2.1 | Implement `OperationalMemory` class | `src/.../memory/operational.py` |
| 3.2.2 | Implement `read_project_state()` | `src/.../memory/operational.py` |
| 3.2.3 | Implement `get_active_decisions()` | `src/.../memory/operational.py` |
| 3.2.4 | Implement `get_constraints()` | `src/.../memory/operational.py` |
| 3.2.5 | Implement `record_task_outcome()` | `src/.../memory/operational.py` |
| 3.2.6 | Implement `get_task_history()` | `src/.../memory/operational.py` |
| 3.2.7 | Implement `get_agent_capabilities()` | `src/.../memory/operational.py` |
| 3.2.8 | Write unit tests | `tests/unit/test_operational_memory.py` |

#### 3.3 Knowledge Memory (RAG)

| Task | Description | Output Files |
|------|-------------|--------------|
| 3.3.1 | Implement `KnowledgeMemory` class | `src/.../memory/knowledge.py` |
| 3.3.2 | Implement `index_ops_directory()` | `src/.../memory/knowledge.py` |
| 3.3.3 | Implement `retrieve()` method (top 5 limit) | `src/.../memory/knowledge.py` |
| 3.3.4 | Implement `get_relevant_runbooks()` | `src/.../memory/knowledge.py` |
| 3.3.5 | Implement `get_relevant_adrs()` | `src/.../memory/knowledge.py` |
| 3.3.6 | Implement `get_known_issues()` | `src/.../memory/knowledge.py` |
| 3.3.7 | Support local embeddings (sentence-transformers) | `src/.../memory/knowledge.py` |
| 3.3.8 | Support pgvector (optional) | `src/.../memory/knowledge.py` |
| 3.3.9 | Write integration tests | `tests/integration/test_knowledge_memory.py` |

**Retrieval Policy:**
- Maximum 5 items retrieved per query
- Always include: current `project_state.json` summary
- Include relevant ADRs for task type
- Include runbook snippets only when needed (e.g., deploy tasks)

#### 3.4 Working Memory

| Task | Description | Output Files |
|------|-------------|--------------|
| 3.4.1 | Implement `WorkingMemory` class | `src/.../memory/working.py` |
| 3.4.2 | Implement `create_session()` (task-scoped) | `src/.../memory/working.py` |
| 3.4.3 | Implement `add_note()` | `src/.../memory/working.py` |
| 3.4.4 | Implement `add_tool_output()` | `src/.../memory/working.py` |
| 3.4.5 | Implement `get_context()` | `src/.../memory/working.py` |
| 3.4.6 | Implement `summarize()` (for promotion) | `src/.../memory/working.py` |
| 3.4.7 | Implement `discard()` | `src/.../memory/working.py` |
| 3.4.8 | Write unit tests | `tests/unit/test_working_memory.py` |

#### 3.5 Memory Write Gate

| Task | Description | Output Files |
|------|-------------|--------------|
| 3.5.1 | Implement `MemoryWriteGate` class | `src/.../memory/write_gate.py` |
| 3.5.2 | Implement `validate_patch()` method | `src/.../memory/write_gate.py` |
| 3.5.3 | Implement `_check_no_secrets()` | `src/.../memory/write_gate.py` |
| 3.5.4 | Implement `_check_schema()` | `src/.../memory/write_gate.py` |
| 3.5.5 | Implement `_check_evidence()` | `src/.../memory/write_gate.py` |
| 3.5.6 | Implement `_check_risk_level()` | `src/.../memory/write_gate.py` |
| 3.5.7 | Implement `apply_patch()` (with approval routing) | `src/.../memory/write_gate.py` |
| 3.5.8 | Define patch risk levels (LOW auto-apply, MEDIUM review, HIGH approval) | `src/.../memory/write_gate.py` |
| 3.5.9 | Write unit tests | `tests/unit/test_write_gate.py` |

**Patch Risk Classification:**
- **LOW:** Tag fixes, formatting, metadata updates → auto-apply
- **MEDIUM:** Content merges, new runbook entries → apply with review
- **HIGH:** ADR changes, policy updates, deletions → require explicit approval

#### 3.6 Memory Librarian Agent

| Task | Description | Output Files |
|------|-------------|--------------|
| 3.6.1 | Implement `MemoryLibrarian` class | `src/.../memory/librarian.py` |
| 3.6.2 | Implement `summarize_task_logs()` | `src/.../memory/librarian.py` |
| 3.6.3 | Implement `deduplicate_entries()` | `src/.../memory/librarian.py` |
| 3.6.4 | Implement `merge_runbooks()` | `src/.../memory/librarian.py` |
| 3.6.5 | Implement `promote_to_golden()` | `src/.../memory/librarian.py` |
| 3.6.6 | Implement `demote_stale()` | `src/.../memory/librarian.py` |
| 3.6.7 | Implement `normalize_tags()` | `src/.../memory/librarian.py` |
| 3.6.8 | Implement `scan_for_secrets()` | `src/.../memory/librarian.py` |
| 3.6.9 | Implement `detect_conflicts()` | `src/.../memory/librarian.py` |
| 3.6.10 | Implement `generate_patch_set()` | `src/.../memory/librarian.py` |
| 3.6.11 | Write integration tests | `tests/integration/test_librarian.py` |

**Librarian Restrictions:**
- Cannot rewrite ADR decisions without approval
- Cannot delete task history
- Cannot change risk gate rules or budgets
- All changes submitted as patches through Write Gate

#### 3.7 Maintenance Scheduler

| Task | Description | Output Files |
|------|-------------|--------------|
| 3.7.1 | Implement `MemoryMaintenance` class | `src/.../memory/maintenance.py` |
| 3.7.2 | Implement `run_daily()` job | `src/.../memory/maintenance.py` |
| 3.7.3 | Implement `run_weekly()` job | `src/.../memory/maintenance.py` |
| 3.7.4 | Implement `run_monthly()` job | `src/.../memory/maintenance.py` |
| 3.7.5 | Generate `memory_report_YYYY-MM-DD.md` | `src/.../memory/maintenance.py` |
| 3.7.6 | Generate `patchset_YYYY-MM-DD.json` | `src/.../memory/maintenance.py` |
| 3.7.7 | Integrate with approval queue | `src/.../memory/maintenance.py` |
| 3.7.8 | Implement `rebuild_embeddings_index()` | `src/.../memory/maintenance.py` |
| 3.7.9 | Write integration tests | `tests/integration/test_maintenance.py` |

**Maintenance Cadence:**
- **Daily:** Summarize yesterday's runs, archive raw logs (Tier 3 → Tier 2)
- **Weekly:** Dedupe/merge runbooks, tag cleanup, conflict detection
- **Monthly:** Review superseded items, prune old caches, rebuild index

#### 3.8 /ops/ Directory Structure

| Task | Description | Output Files |
|------|-------------|--------------|
| 3.8.1 | Create `/ops/` directory structure | `ops/` folder |
| 3.8.2 | Create initial `project_state.json` | `ops/project_state.json` |
| 3.8.3 | Create ADR template | `ops/decisions/ADR-TEMPLATE.md` |
| 3.8.4 | Create ADR-0001: Memory Architecture | `ops/decisions/ADR-0001-memory-architecture.md` |
| 3.8.5 | Create agent prompt templates | `ops/prompts/*.md` |
| 3.8.6 | Create initial runbooks | `ops/runbooks/*.md` |
| 3.8.7 | Create error-fixes pattern file | `ops/patterns/error-fixes.md` |

### Phase 3 Acceptance Criteria

- [ ] `OperationalMemory.read_project_state()` returns authoritative state
- [ ] `KnowledgeMemory.retrieve()` returns max 5 relevant items
- [ ] `WorkingMemory` properly scopes to task and discards on completion
- [ ] `MemoryWriteGate.validate_patch()` blocks patches with secrets
- [ ] `MemoryWriteGate` routes HIGH-risk patches to approval queue
- [ ] `MemoryLibrarian` generates valid patch sets
- [ ] Maintenance jobs run without errors
- [ ] `/ops/` directory contains all required files
- [ ] All unit and integration tests pass

### Phase 3 Deliverables

| Deliverable | Verification |
|-------------|--------------|
| Operational Memory | State reads from SQLite + files |
| Knowledge Memory | RAG retrieval returns relevant items |
| Working Memory | Per-task scratch works correctly |
| Write Gate | Patches validated and routed |
| Librarian | Generates cleanup patches |
| Maintenance | Scheduled jobs execute |
| /ops/ structure | All files in place |

---

## Phase 4: Control Loop + Risk Gate

**Goal:** Implement health monitoring, stuck detection, and risk-based autonomy.

**Dependencies:** Phase 3 complete

### Tasks

#### 4.1 Health Check System

| Task | Description | Output Files |
|------|-------------|--------------|
| 3.1.1 | Define `AgentHealthCheck` dataclass | `src/.../control/health.py` |
| 3.1.2 | Implement `is_stuck()` method | `src/.../control/health.py` |
| 3.1.3 | Implement idle-but-burning-tokens detection | `src/.../control/health.py` |
| 3.1.4 | Implement repeated-error-loop detection | `src/.../control/health.py` |
| 3.1.5 | Implement edit-oscillation detection | `src/.../control/health.py` |
| 3.1.6 | Implement awaiting-human-decision detection | `src/.../control/health.py` |
| 3.1.7 | Write unit tests for all stuck heuristics | `tests/unit/test_health_check.py` |

#### 3.2 Control Loop

| Task | Description | Output Files |
|------|-------------|--------------|
| 3.2.1 | Define `ControlAction` enum | `src/.../control/actions.py` |
| 3.2.2 | Implement `AgentControlLoop` class | `src/.../control/loop.py` |
| 3.2.3 | Implement `run()` async loop | `src/.../control/loop.py` |
| 3.2.4 | Implement `_sample_health()` method | `src/.../control/loop.py` |
| 3.2.5 | Implement `_evaluate_agent()` method | `src/.../control/loop.py` |
| 3.2.6 | Implement `_execute_action()` method | `src/.../control/loop.py` |
| 3.2.7 | Implement `_auto_prompt_agent()` | `src/.../control/loop.py` |
| 3.2.8 | Implement `_generate_unstick_prompt()` | `src/.../control/loop.py` |
| 3.2.9 | Write integration tests | `tests/integration/test_control_loop.py` |

#### 3.3 Risk Policy

| Task | Description | Output Files |
|------|-------------|--------------|
| 3.3.1 | Define `RiskLevel` enum (if not in Phase 1) | `src/.../risk/policy.py` |
| 3.3.2 | Define `HIGH_RISK_FILE_PATTERNS` | `src/.../risk/policy.py` |
| 3.3.3 | Define `MEDIUM_RISK_FILE_PATTERNS` | `src/.../risk/policy.py` |
| 3.3.4 | Define `HIGH_RISK_COMMAND_PATTERNS` | `src/.../risk/policy.py` |
| 3.3.5 | Define `MEDIUM_RISK_COMMAND_PATTERNS` | `src/.../risk/policy.py` |
| 3.3.6 | Implement `RiskPolicy.classify_file()` | `src/.../risk/policy.py` |
| 3.3.7 | Implement `RiskPolicy.classify_command()` | `src/.../risk/policy.py` |
| 3.3.8 | Write unit tests for classification | `tests/unit/test_risk_policy.py` |
| 3.3.9 | Write RISK_POLICY.md documentation | `docs/RISK_POLICY.md` |

#### 3.4 CRITICAL Blocklist

| Task | Description | Output Files |
|------|-------------|--------------|
| 3.4.1 | Define `CRITICAL_BLOCKLIST` patterns | `src/.../risk/blocklist.py` |
| 3.4.2 | Add force push patterns | `src/.../risk/blocklist.py` |
| 3.4.3 | Add exfiltration patterns | `src/.../risk/blocklist.py` |
| 3.4.4 | Add destructive infrastructure patterns | `src/.../risk/blocklist.py` |
| 3.4.5 | Add credential exposure patterns | `src/.../risk/blocklist.py` |
| 3.4.6 | Write unit tests for all blocklist patterns | `tests/unit/test_blocklist.py` |

#### 3.5 Autonomy Gate

| Task | Description | Output Files |
|------|-------------|--------------|
| 3.5.1 | Implement `AutonomyGate` class | `src/.../risk/autonomy_gate.py` |
| 3.5.2 | Implement `check_action()` method | `src/.../risk/autonomy_gate.py` |
| 3.5.3 | Implement CRITICAL auto-reject logic | `src/.../risk/autonomy_gate.py` |
| 3.5.4 | Implement LOW auto-allow logic | `src/.../risk/autonomy_gate.py` |
| 3.5.5 | Implement MEDIUM conditional logic | `src/.../risk/autonomy_gate.py` |
| 3.5.6 | Implement HIGH approval-required logic | `src/.../risk/autonomy_gate.py` |
| 3.5.7 | Implement `_log_blocked()` for CRITICAL | `src/.../risk/autonomy_gate.py` |
| 3.5.8 | Wire to interrupt handler (placeholder) | `src/.../risk/autonomy_gate.py` |
| 3.5.9 | Write integration tests | `tests/integration/test_autonomy_gate.py` |

### Phase 3 Acceptance Criteria

- [ ] `AgentHealthCheck.is_stuck()` correctly identifies all stuck scenarios
- [ ] Control loop samples health every N seconds (configurable)
- [ ] Auto-prompt fires for recoverable stuck states
- [ ] Escalation fires for unrecoverable stuck states
- [ ] `RiskPolicy.classify_file()` correctly categorizes test paths
- [ ] `RiskPolicy.classify_command()` correctly categorizes test commands
- [ ] All CRITICAL blocklist patterns auto-reject
- [ ] `AutonomyGate.check_action()` returns correct (allowed, reason) tuples
- [ ] All unit and integration tests pass

### Phase 3 Deliverables

| Deliverable | Verification |
|-------------|--------------|
| Health monitoring | Stuck detection tests pass |
| Control loop | Integration test with mock agent |
| Risk classification | All file/command patterns tested |
| CRITICAL blocklist | All dangerous patterns blocked |
| Autonomy gate | Four-tier logic verified |
| Documentation | RISK_POLICY.md complete |

---

## Phase 4: Human Interrupt + Approvals

**Goal:** Implement user approval interfaces for risky operations.

**Dependencies:** Phase 3 complete

### Tasks

#### 4.1 CLI Interrupt Handler (V1)

| Task | Description | Output Files |
|------|-------------|--------------|
| 4.1.1 | Implement `CLIInterruptHandler` class | `src/.../interrupt/cli_handler.py` |
| 4.1.2 | Implement `request_approval()` with CLI prompt | `src/.../interrupt/cli_handler.py` |
| 4.1.3 | Handle y/n/s (skip) responses | `src/.../interrupt/cli_handler.py` |
| 4.1.4 | Implement `_record_decision()` | `src/.../interrupt/cli_handler.py` |
| 4.1.5 | Handle EOFError/KeyboardInterrupt | `src/.../interrupt/cli_handler.py` |
| 4.1.6 | Write unit tests | `tests/unit/test_cli_interrupt.py` |

#### 4.2 Async Interrupt Handler (V2)

| Task | Description | Output Files |
|------|-------------|--------------|
| 4.2.1 | Define `WebhookConfig` dataclass | `src/.../interrupt/async_handler.py` |
| 4.2.2 | Implement `AsyncInterruptHandler` class | `src/.../interrupt/async_handler.py` |
| 4.2.3 | Implement `request_approval()` with timeout | `src/.../interrupt/async_handler.py` |
| 4.2.4 | Implement `_send_notification()` | `src/.../interrupt/async_handler.py` |
| 4.2.5 | Implement Slack message formatting | `src/.../interrupt/async_handler.py` |
| 4.2.6 | Implement `handle_approval_response()` | `src/.../interrupt/async_handler.py` |
| 4.2.7 | Write integration tests | `tests/integration/test_async_interrupt.py` |

#### 4.3 Integration with Autonomy Gate

| Task | Description | Output Files |
|------|-------------|--------------|
| 4.3.1 | Wire CLIInterruptHandler to AutonomyGate | `src/.../risk/autonomy_gate.py` |
| 4.3.2 | Wire AsyncInterruptHandler to AutonomyGate | `src/.../risk/autonomy_gate.py` |
| 4.3.3 | Add configuration for handler selection | `src/.../config.py` |
| 4.3.4 | Write end-to-end test | `tests/e2e/test_approval_flow.py` |

### Phase 4 Acceptance Criteria

- [ ] CLI prompt appears for MEDIUM commands and HIGH actions
- [ ] User can approve (y), reject (n), or skip (s)
- [ ] Decisions are recorded in `approvals` table
- [ ] Webhook notification sends correctly (mock test)
- [ ] Async handler waits for response with timeout
- [ ] Timeout results in rejection
- [ ] `AutonomyGate` correctly dispatches to configured handler
- [ ] All tests pass

### Phase 4 Deliverables

| Deliverable | Verification |
|-------------|--------------|
| CLI approval prompt | Manual test with real terminal |
| Webhook notifications | Integration test with mock server |
| Approval persistence | Database records all decisions |
| Timeout handling | Test that timeout rejects |

---

## Phase 5: Budget Controls

**Goal:** Implement cost controls for agents, MCP servers, and tools.

**Dependencies:** Phase 4 complete

### Tasks

#### 5.1 Agent Budget

| Task | Description | Output Files |
|------|-------------|--------------|
| 5.1.1 | Define `AgentBudget` dataclass | `src/.../budget/agent_budget.py` |
| 5.1.2 | Define `BUDGETS` configuration dict | `src/.../budget/agent_budget.py` |
| 5.1.3 | Implement budget enforcement in TaskRouter | `src/.../routing/router.py` |
| 5.1.4 | Implement `_has_quota()` with daily reset | `src/.../routing/router.py` |
| 5.1.5 | Write unit tests | `tests/unit/test_agent_budget.py` |

#### 5.2 MCP Budget

| Task | Description | Output Files |
|------|-------------|--------------|
| 5.2.1 | Define `MCPServerBudget` dataclass | `src/.../budget/mcp_budget.py` |
| 5.2.2 | Define `ToolBudget` dataclass | `src/.../budget/mcp_budget.py` |
| 5.2.3 | Define `MCP_BUDGETS` configuration | `src/.../budget/mcp_budget.py` |
| 5.2.4 | Define `TOOL_BUDGETS` configuration | `src/.../budget/mcp_budget.py` |
| 5.2.5 | Implement `MCPUsageTracker` class | `src/.../budget/mcp_budget.py` |
| 5.2.6 | Implement `record_tool_call()` | `src/.../budget/mcp_budget.py` |
| 5.2.7 | Implement `check_budget()` | `src/.../budget/mcp_budget.py` |
| 5.2.8 | Write unit tests | `tests/unit/test_mcp_budget.py` |

#### 5.3 MCP Registry

| Task | Description | Output Files |
|------|-------------|--------------|
| 5.3.1 | Implement `MCPRegistry` class | `src/.../budget/registry.py` |
| 5.3.2 | Define `ALLOWED_SERVERS` set | `src/.../budget/registry.py` |
| 5.3.3 | Define `BLOCKED_SERVERS` set | `src/.../budget/registry.py` |
| 5.3.4 | Define `APPROVAL_REQUIRED_TOOLS` set | `src/.../budget/registry.py` |
| 5.3.5 | Implement `is_allowed()` method | `src/.../budget/registry.py` |
| 5.3.6 | Write unit tests | `tests/unit/test_mcp_registry.py` |

#### 5.4 Token Audit Integration

| Task | Description | Output Files |
|------|-------------|--------------|
| 5.4.1 | Implement `TokenAuditClient` class | `src/.../observability/token_audit.py` |
| 5.4.2 | Implement MCP query for usage | `src/.../observability/token_audit.py` |
| 5.4.3 | Implement daily summary fetch | `src/.../observability/token_audit.py` |
| 5.4.4 | Integrate with budget enforcement | `src/.../budget/agent_budget.py` |
| 5.4.5 | Write integration tests | `tests/integration/test_token_audit.py` |

### Phase 5 Acceptance Criteria

- [ ] Agent budget prevents task routing when daily limit exceeded
- [ ] MCP server budget prevents tool calls when limit exceeded
- [ ] Per-tool budget enforced correctly
- [ ] MCPRegistry blocks non-allowlisted servers
- [ ] MCPRegistry requires approval for sensitive tools
- [ ] Token Audit MCP query returns usage data
- [ ] All tests pass

### Phase 5 Deliverables

| Deliverable | Verification |
|-------------|--------------|
| Agent budget enforcement | Routing stops at limit |
| MCP budget tracking | Tool calls logged and limited |
| Tool allowlist | Blocked servers rejected |
| Token Audit integration | Usage data queryable |

---

## Phase 6: Merge Gate + Observability

**Goal:** Implement protected branch controls and deploy observability stack.

**Dependencies:** Phase 5 complete

### Tasks

#### 6.1 Merge Readiness

| Task | Description | Output Files |
|------|-------------|--------------|
| 6.1.1 | Define `MergeReadiness` dataclass | `src/.../merge/readiness.py` |
| 6.1.2 | Implement `is_ready()` method | `src/.../merge/readiness.py` |
| 6.1.3 | Implement tests_pass check | `src/.../merge/readiness.py` |
| 6.1.4 | Implement no_critical_risks check | `src/.../merge/readiness.py` |
| 6.1.5 | Implement no_pending_approvals check | `src/.../merge/readiness.py` |
| 6.1.6 | Implement status_packet_complete check | `src/.../merge/readiness.py` |
| 6.1.7 | Implement no_merge_conflicts check | `src/.../merge/readiness.py` |
| 6.1.8 | Write unit tests | `tests/unit/test_merge_readiness.py` |

#### 6.2 Merge Gate

| Task | Description | Output Files |
|------|-------------|--------------|
| 6.2.1 | Implement `MergeGate` class | `src/.../merge/gate.py` |
| 6.2.2 | Define protected branches list | `src/.../merge/gate.py` |
| 6.2.3 | Implement `request_merge()` method | `src/.../merge/gate.py` |
| 6.2.4 | Implement `_check_readiness()` | `src/.../merge/gate.py` |
| 6.2.5 | Implement `_check_conflicts()` | `src/.../merge/gate.py` |
| 6.2.6 | Implement `_execute_merge()` | `src/.../merge/gate.py` |
| 6.2.7 | Implement merge lock (asyncio.Lock) | `src/.../merge/gate.py` |
| 6.2.8 | Write integration tests | `tests/integration/test_merge_gate.py` |

#### 6.3 AI Observer Integration

| Task | Description | Output Files |
|------|-------------|--------------|
| 6.3.1 | Implement `AIObserverClient` class | `src/.../observability/ai_observer.py` |
| 6.3.2 | Implement metrics push | `src/.../observability/ai_observer.py` |
| 6.3.3 | Implement alert configuration | `src/.../observability/ai_observer.py` |
| 6.3.4 | Write deployment script | `scripts/deploy_ai_observer.sh` |
| 6.3.5 | Write integration tests | `tests/integration/test_ai_observer.py` |

#### 6.4 Alerting

| Task | Description | Output Files |
|------|-------------|--------------|
| 6.4.1 | Implement stuck agent alert | `src/.../observability/alerts.py` |
| 6.4.2 | Implement cost threshold alert | `src/.../observability/alerts.py` |
| 6.4.3 | Implement error spike alert | `src/.../observability/alerts.py` |
| 6.4.4 | Configure alert destinations | `src/.../config.py` |
| 6.4.5 | Write unit tests | `tests/unit/test_alerts.py` |

### Phase 6 Acceptance Criteria

- [ ] Merge blocked when tests fail
- [ ] Merge blocked when pending approvals exist
- [ ] Merge blocked when critical risks detected
- [ ] Merge blocked when conflicts exist
- [ ] Only one merge executes at a time (lock works)
- [ ] AI Observer dashboard accessible
- [ ] Stuck agent alert fires correctly
- [ ] Cost threshold alert fires correctly
- [ ] All tests pass

### Phase 6 Deliverables

| Deliverable | Verification |
|-------------|--------------|
| Merge gate | Protected branches enforced |
| AI Observer | Dashboard shows metrics |
| Alerting | Test alerts fire correctly |

---

## Phase 7: Production Hardening

**Goal:** Prepare system for production use with reliability and documentation.

**Dependencies:** Phase 6 complete

### Tasks

#### 7.1 Reliability

| Task | Description | Output Files |
|------|-------------|--------------|
| 7.1.1 | Implement rate limiting with exponential backoff | `src/.../adapters/base.py` |
| 7.1.2 | Implement retry logic for transient failures | `src/.../adapters/base.py` |
| 7.1.3 | Implement fallback model configuration | `src/.../routing/router.py` |
| 7.1.4 | Implement graceful shutdown | `src/.../main.py` |
| 7.1.5 | Write stress tests | `tests/load/test_concurrent_agents.py` |

#### 7.2 Audit Trail

| Task | Description | Output Files |
|------|-------------|--------------|
| 7.2.1 | Implement comprehensive event logging | `src/.../observability/audit.py` |
| 7.2.2 | Implement log rotation | `src/.../observability/audit.py` |
| 7.2.3 | Implement audit query API | `src/.../observability/audit.py` |
| 7.2.4 | Write audit export script | `scripts/export_audit.py` |

#### 7.3 Load Testing

| Task | Description | Output Files |
|------|-------------|--------------|
| 7.3.1 | Create load test scenarios | `tests/load/scenarios.py` |
| 7.3.2 | Test 3 concurrent agents | `tests/load/test_concurrent_agents.py` |
| 7.3.3 | Test 5 concurrent agents | `tests/load/test_concurrent_agents.py` |
| 7.3.4 | Test burst requests | `tests/load/test_burst.py` |
| 7.3.5 | Document performance baselines | `docs/OPERATIONS.md` |

#### 7.4 Documentation

| Task | Description | Output Files |
|------|-------------|--------------|
| 7.4.1 | Write OPERATIONS.md runbook | `docs/OPERATIONS.md` |
| 7.4.2 | Document common troubleshooting | `docs/OPERATIONS.md` |
| 7.4.3 | Document scaling considerations | `docs/OPERATIONS.md` |
| 7.4.4 | Write API.md for internal APIs | `docs/API.md` |
| 7.4.5 | Create example workflows | `examples/*.py` |

### Phase 7 Acceptance Criteria

- [ ] Rate limiting prevents API abuse
- [ ] Transient failures retry automatically
- [ ] Fallback model kicks in when primary fails
- [ ] Graceful shutdown completes in-flight work
- [ ] 5 concurrent agents run without issues
- [ ] Audit trail captures all significant events
- [ ] OPERATIONS.md covers all common scenarios
- [ ] Example workflows run successfully

### Phase 7 Deliverables

| Deliverable | Verification |
|-------------|--------------|
| Rate limiting | Load test passes |
| Fallback models | Failure injection test |
| Audit trail | Events queryable |
| Documentation | Team review complete |

---

## Risk Register

| Risk | Likelihood | Impact | Mitigation |
|------|------------|--------|------------|
| Claude Code headless mode changes | Medium | High | Abstract behind adapter, test regularly |
| API rate limits hit unexpectedly | High | Medium | Implement backoff, budget alerts |
| Stuck agents burn tokens | High | Medium | Health monitoring, auto-terminate |
| Secret leakage in logs | Medium | Critical | SecretRedactor, never-store patterns |
| Merge conflicts between agents | Medium | Medium | Workspace isolation, merge gate |
| AGPL license violation | Low | High | Use DIY workspace, avoid Claude Squad |
| Token Audit repo unavailable | Low | Low | Fallback to SDK usage tracking |
| AI Observer deployment issues | Medium | Low | Graceful degradation, SQLite fallback |

---

## Definition of Done (Global)

A phase is complete when ALL of the following are true:

1. **Code Complete:** All tasks implemented per specification
2. **Tests Pass:** Unit and integration tests for all new code
3. **Documentation:** Relevant docs updated (README, architecture, etc.)
4. **Secret Safety:** SecretRedactor covers any new patterns
5. **Review:** Code reviewed (self-review acceptable for solo work)
6. **Manual Test:** At least one manual end-to-end verification
7. **Commit:** All changes committed with clear messages

---

## Quick Reference: Key Files

| Component | Primary File |
|-----------|-------------|
| Entry point | `src/agent_orchestrator/main.py` |
| Configuration | `src/agent_orchestrator/config.py` |
| Adapter base | `src/agent_orchestrator/adapters/base.py` |
| Claude Code | `src/agent_orchestrator/adapters/claude_code_cli.py` |
| Database | `src/agent_orchestrator/persistence/database.py` |
| Project Journal | `src/agent_orchestrator/journal/project_journal.py` |
| Control Loop | `src/agent_orchestrator/control/loop.py` |
| Risk Gate | `src/agent_orchestrator/risk/autonomy_gate.py` |
| Merge Gate | `src/agent_orchestrator/merge/gate.py` |
| Secrets | `src/agent_orchestrator/secrets/redactor.py` |
| Rate Limiter | `src/agent_orchestrator/reliability/rate_limiter.py` |
| Retry/Circuit Breaker | `src/agent_orchestrator/reliability/retry.py` |
| Graceful Shutdown | `src/agent_orchestrator/reliability/shutdown.py` |
| Audit Trail | `src/agent_orchestrator/observability/audit.py` |

---

*Document generated: January 15, 2026*
*Status: All 8 phases complete. System ready for production deployment.*
