# Deployment Runbook

## Overview

This runbook covers deploying the Agent Orchestrator to production environments.

## Prerequisites

- Python 3.11+ installed on target server
- tmux installed (for CLI agent isolation)
- Git installed
- Access to production database
- CLI agents authenticated

## Deployment Steps

### 1. Pre-Deployment Checks

```bash
# Check current version
python -m agent_orchestrator --version

# Check system health
curl http://localhost:8080/api/health

# Check for pending tasks
curl http://localhost:8080/api/tasks?status=running

# Wait for running tasks to complete or cancel them
```

### 2. Backup Current State

```bash
# Backup database
cp orchestrator.db orchestrator.db.bak.$(date +%Y%m%d_%H%M%S)

# Backup configuration
cp config.yaml config.yaml.bak

# Backup ops directory
tar -czf ops_backup_$(date +%Y%m%d).tar.gz ops/
```

### 3. Stop Current Instance

```bash
# Graceful shutdown
python -m agent_orchestrator stop --graceful --timeout 60

# Verify all agents stopped
tmux list-sessions | grep agent_

# Kill any remaining sessions
tmux kill-server  # Use with caution
```

### 4. Update Code

```bash
# Pull latest changes
git fetch origin
git checkout main
git pull origin main

# Or deploy specific version
git checkout v1.2.3
```

### 5. Update Dependencies

```bash
# Activate virtual environment
source venv/bin/activate

# Update dependencies
pip install -e ".[prod]"

# Verify installation
python -c "import agent_orchestrator; print(agent_orchestrator.__version__)"
```

### 6. Run Migrations

```bash
# Check pending migrations
python -m agent_orchestrator db status

# Apply migrations
python -m agent_orchestrator db upgrade

# Verify migration
python -m agent_orchestrator db verify
```

### 7. Validate Configuration

```bash
# Validate config
python -m agent_orchestrator config validate

# Show effective config
python -m agent_orchestrator config show
```

### 8. Start New Instance

```bash
# Start orchestrator
python -m agent_orchestrator start --daemon

# Start API server
python -m agent_orchestrator.api &

# Verify health
curl http://localhost:8080/api/health
```

### 9. Post-Deployment Verification

```bash
# Check agent status
curl http://localhost:8080/api/agents

# Submit test task
curl -X POST \
  -H "Content-Type: application/json" \
  -d '{"description": "Test task: echo hello"}' \
  http://localhost:8080/api/tasks

# Monitor logs
tail -f logs/orchestrator.log
```

### 10. Enable Monitoring

```bash
# Verify alerting works
python -m agent_orchestrator alert test

# Check metrics endpoint
curl http://localhost:8080/api/stats
```

## Rollback Procedure

If deployment fails:

```bash
# Stop new instance
python -m agent_orchestrator stop

# Restore previous version
git checkout <previous-version>

# Restore database
cp orchestrator.db.bak.YYYYMMDD_HHMMSS orchestrator.db

# Restart
python -m agent_orchestrator start --daemon

# Verify health
curl http://localhost:8080/api/health
```

## Blue-Green Deployment

For zero-downtime deployments:

```bash
# 1. Deploy to green environment
./deploy.sh green

# 2. Verify green health
curl http://green:8080/api/health

# 3. Switch traffic to green
./switch-traffic.sh green

# 4. Monitor for issues
./monitor.sh --duration 300

# 5. If OK, decommission blue
./stop.sh blue

# 6. If issues, rollback to blue
./switch-traffic.sh blue
```

## Troubleshooting

### Deployment Fails to Start

1. Check logs: `tail -f logs/orchestrator.log`
2. Verify config: `python -m agent_orchestrator config validate`
3. Check database: `python -m agent_orchestrator db status`
4. Verify ports: `netstat -tlnp | grep 8080`

### Agents Not Connecting

1. Check agent authentication
2. Verify tmux sessions
3. Check agent-specific logs
4. Re-authenticate agents

### Database Migration Fails

1. Check migration status
2. Review migration script
3. Restore from backup
4. Fix migration and retry

## Checklist

- [ ] Pre-deployment checks passed
- [ ] Database backed up
- [ ] Current instance stopped gracefully
- [ ] Code updated to target version
- [ ] Dependencies updated
- [ ] Migrations applied
- [ ] Configuration validated
- [ ] New instance started
- [ ] Health check passed
- [ ] Test task completed
- [ ] Monitoring verified
