# Session Tracing Diagram

This diagram shows the LangSmith-style session tracing system.

## Trace Structure

```mermaid
graph TB
    subgraph "Trace: task-123"
        ROOT[Root Span<br/>TASK: implement-feature]

        ROOT --> A1[Agent Span<br/>claude-code]
        ROOT --> A2[Agent Span<br/>gemini-cli]

        A1 --> L1[LLM Call<br/>Opus 4]
        A1 --> L2[LLM Call<br/>Opus 4]
        A1 --> T1[Tool Call<br/>read_file]
        A1 --> T2[Tool Call<br/>write_file]

        A2 --> L3[LLM Call<br/>Gemini Pro]
        A2 --> T3[Tool Call<br/>search]

        T2 --> AP[Approval<br/>HIGH risk]
    end

    subgraph "Span Metadata"
        META[Each Span Has:]
        META --> M1[span_id]
        META --> M2[trace_id]
        META --> M3[parent_id]
        META --> M4[start_time]
        META --> M5[end_time]
        META --> M6[status]
        META --> M7[metrics]
    end
```

## Span Kinds

```mermaid
graph LR
    subgraph "SpanKind Enum"
        SK[SpanKind]
        SK --> TASK[TASK]
        SK --> AGENT[AGENT]
        SK --> LLM[LLM_CALL]
        SK --> TOOL[TOOL_CALL]
        SK --> APPROVAL[APPROVAL]
        SK --> WORKFLOW[WORKFLOW]
        SK --> STEP[STEP]
        SK --> HANDOFF[HANDOFF]
        SK --> VOTE[VOTE]
    end

    subgraph "SpanStatus Enum"
        SS[SpanStatus]
        SS --> RUNNING[RUNNING]
        SS --> SUCCESS[SUCCESS]
        SS --> ERROR[ERROR]
        SS --> TIMEOUT[TIMEOUT]
        SS --> CANCELLED[CANCELLED]
    end
```

## Tracer API

```mermaid
sequenceDiagram
    participant App
    participant Tracer
    participant Storage
    participant Replay

    App->>Tracer: start_trace(task_id, agent_id)
    activate Tracer
    Tracer->>Tracer: Create root span

    App->>Tracer: span("llm_call", LLM_CALL)
    activate Tracer
    Note over Tracer: Context manager

    Tracer->>Tracer: Create child span
    Note over App: Execute LLM call
    Tracer->>Tracer: Record metrics
    Tracer-->>App: Span complete
    deactivate Tracer

    App->>Tracer: span("tool_call", TOOL_CALL)
    activate Tracer
    Note over App: Execute tool
    Tracer-->>App: Span complete
    deactivate Tracer

    App->>Tracer: end_trace()
    Tracer->>Storage: save_trace(trace)
    Storage-->>Tracer: Saved
    Tracer-->>App: Trace complete
    deactivate Tracer

    Note over Replay: Later...

    Replay->>Storage: get_trace(trace_id)
    Storage-->>Replay: Trace data
    Replay->>Replay: generate_timeline()
    Replay->>Replay: extract_decision_points()
    Replay-->>App: Timeline + Decisions
```

## Timeline View

```
Trace: task-123 | Duration: 45.2s | Cost: $0.42

├─ [00:00.0] TASK implement-feature ─────────────────────────────────────
│  ├─ [00:00.1] AGENT claude-code ────────────────────────────────────
│  │  ├─ [00:00.2] LLM_CALL opus-4 ──────────── 5.2s | 2,500 tokens
│  │  ├─ [00:05.4] TOOL_CALL read_file ──────── 0.1s
│  │  ├─ [00:05.5] LLM_CALL opus-4 ──────────── 8.3s | 4,200 tokens
│  │  ├─ [00:13.8] TOOL_CALL write_file ─────── 0.2s
│  │  │  └─ [00:13.9] APPROVAL high_risk ────── 12.0s | APPROVED
│  │  └─ [00:25.9] ✓ SUCCESS
│  │
│  ├─ [00:26.0] AGENT gemini-cli ─────────────────────────────────────
│  │  ├─ [00:26.1] LLM_CALL gemini-pro ──────── 3.5s | 1,800 tokens
│  │  ├─ [00:29.6] TOOL_CALL search ─────────── 1.2s
│  │  └─ [00:30.8] ✓ SUCCESS
│  │
│  └─ [00:45.2] ✓ SUCCESS
```

## Trace Storage

```mermaid
erDiagram
    TRACES {
        string trace_id PK
        string task_id
        string agent_id
        datetime started_at
        datetime ended_at
        string status
        float total_cost_usd
        int total_tokens
        float total_latency_ms
    }

    SPANS {
        string span_id PK
        string trace_id FK
        string parent_id FK
        string name
        string kind
        datetime start_time
        datetime end_time
        string status
        string error
        json metadata
    }

    SPAN_METRICS {
        string span_id FK
        int input_tokens
        int output_tokens
        float cost_usd
        float latency_ms
        string model
    }

    SPAN_CONTENT {
        string span_id FK
        text input_text
        text output_text
        json tool_calls
        json tool_results
    }

    TRACES ||--o{ SPANS : contains
    SPANS ||--o| SPAN_METRICS : has
    SPANS ||--o| SPAN_CONTENT : has
    SPANS ||--o{ SPANS : parent_of
```

## Session Replay

```mermaid
flowchart TD
    subgraph "Replay Features"
        T[Trace] --> TL[Generate Timeline]
        T --> DP[Extract Decision Points]
        T --> EG[Build Execution Graph]
        T --> M[Aggregate Metrics]
    end

    subgraph "Decision Points"
        DP --> D1[Approvals]
        DP --> D2[Risk Classifications]
        DP --> D3[Agent Selections]
        DP --> D4[Error Recoveries]
    end

    subgraph "Execution Graph"
        EG --> N1[Node: Span]
        EG --> E1[Edge: Dependency]
        EG --> C1[Critical Path]
    end

    subgraph "Metrics"
        M --> M1[Total Cost]
        M --> M2[Total Tokens]
        M --> M3[Total Latency]
        M --> M4[Error Rate]
    end
```

## Example Usage

```python
from agent_orchestrator.tracing import Tracer, SpanKind

tracer = Tracer()

# Start a trace
trace = tracer.start_trace(task_id="task-123", agent_id="claude-code")

# Create spans using context managers
async with tracer.span("process_request", SpanKind.TASK) as task_span:

    async with tracer.span("llm_call", SpanKind.LLM_CALL) as llm_span:
        response = await call_llm(prompt)
        llm_span.record_metrics(
            input_tokens=1500,
            output_tokens=500,
            cost_usd=0.03,
            model="opus-4"
        )
        llm_span.record_content(input=prompt, output=response)

    async with tracer.span("tool_call", SpanKind.TOOL_CALL) as tool_span:
        result = await execute_tool("write_file", args)
        tool_span.record_content(
            tool_calls=[{"name": "write_file", "args": args}],
            tool_results=[result]
        )

# End trace
trace = tracer.end_trace()

# Replay later
from agent_orchestrator.tracing import SessionReplay, SQLiteTraceStorage

storage = SQLiteTraceStorage("traces.db")
replay = SessionReplay(storage)

timeline = replay.generate_timeline("task-123")
decisions = replay.extract_decision_points("task-123")
```
