# Setup Guide

Installation and configuration for the Research Development Framework.

---

## Table of Contents

1. [Prerequisites](#prerequisites)
2. [Quick Start](#quick-start)
3. [Full Installation](#full-installation)
4. [Configuration](#configuration)
5. [Verification](#verification)
6. [Optional Features](#optional-features)
7. [Troubleshooting](#troubleshooting)

---

## Prerequisites

| Component | Version | Purpose |
|-----------|---------|---------|
| Python | 3.10+ (3.11 recommended) | Core framework |
| PostgreSQL | 14+ with pgvector (16+ recommended) | Document storage + search |
| pgvector | 0.5+ | Vector similarity search |

---

## Quick Start

**For users with PostgreSQL already installed:**

```bash
cd /var/www/html/research/Research_development

# 1. Install Python dependencies
pip install -r requirements.txt

# 2. Download NLTK data
python -c "import nltk; nltk.download('punkt'); nltk.download('punkt_tab'); nltk.download('stopwords')"

# 3. Create database (if not exists)
sudo -u postgres psql -c "CREATE DATABASE research_dev_db;"
sudo -u postgres psql -c "CREATE USER research_dev_user WITH PASSWORD 'your_password';"
sudo -u postgres psql -c "GRANT ALL PRIVILEGES ON DATABASE research_dev_db TO research_dev_user;"
sudo -u postgres psql -d research_dev_db -c "CREATE EXTENSION IF NOT EXISTS vector;"
sudo -u postgres psql -d research_dev_db -c "CREATE EXTENSION IF NOT EXISTS pg_trgm;"

# 4. Initialize schema
PGPASSWORD='your_password' psql -h localhost -U research_dev_user -d research_dev_db -f database/schema.sql

# 5. Make rdf CLI executable
chmod +x rdf

# 6. Test installation
./rdf --help

# 7. Add documents and start!
cp your-documents/*.pdf NEW_DOCS/
./rdf ingest NEW_DOCS/
./rdf search "your query"
```

---

## Full Installation

### Step 1: PostgreSQL & pgvector

#### Ubuntu/Debian

```bash
# Install PostgreSQL 16
sudo apt update
sudo apt install postgresql-16 postgresql-contrib-16

# Install pgvector extension
sudo apt install postgresql-16-pgvector

# Start PostgreSQL
sudo systemctl start postgresql
sudo systemctl enable postgresql
```

#### macOS

```bash
brew install postgresql@16
brew install pgvector
brew services start postgresql@16
```

#### Verify PostgreSQL

```bash
pg_isready
# or
sudo systemctl status postgresql
```

### Step 2: Create Database

```bash
# Connect as postgres superuser
sudo -u postgres psql
```

```sql
-- Create database
CREATE DATABASE research_dev_db;

-- Create user
CREATE USER research_dev_user WITH PASSWORD 'your_secure_password';

-- Grant privileges
GRANT ALL PRIVILEGES ON DATABASE research_dev_db TO research_dev_user;
ALTER DATABASE research_dev_db OWNER TO research_dev_user;

-- Connect to the database
\c research_dev_db

-- Enable extensions
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS pg_trgm;

-- Grant schema permissions
GRANT ALL ON SCHEMA public TO research_dev_user;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO research_dev_user;
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO research_dev_user;

\q
```

**Initialize schema:**

```bash
cd /var/www/html/research/Research_development
PGPASSWORD='your_password' psql -h localhost -U research_dev_user -d research_dev_db -f database/schema.sql
```

### Step 3: Python Dependencies

```bash
cd /var/www/html/research/Research_development

# Create virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Download NLTK data
python -c "import nltk; nltk.download('punkt'); nltk.download('punkt_tab'); nltk.download('stopwords')"
```

### Step 4: Make CLI Executable

```bash
chmod +x rdf
```

---

## Configuration

### Configuration Files

| File | Purpose | Required |
|------|---------|----------|
| `.env.db` | Database credentials | Yes |
| `.env` | API keys | Optional |
| `config/project.yaml` | Project settings | Yes |

### .env.db (Database)

```bash
cat .env.db
```

Should contain:
```
DB_HOST=localhost
DB_PORT=5432
DB_NAME=research_dev_db
DB_USER=research_dev_user
DB_PASSWORD=your_secure_password
```

### .env (API Keys - Optional)

```bash
# For semantic search (embeddings)
OPENAI_API_KEY=sk-your-key-here

# For web search (if enabled)
TAVILY_API_KEY=tvly-your-key-here
```

### config/project.yaml

Project configuration:

```yaml
# Database (uses .env.db by default)
database:
  url: ${DATABASE_URL}

# Embeddings configuration
embeddings:
  provider: "openai"              # "openai" or "local"
  model: "text-embedding-3-small"
  dimension: 1536

# Agent safety defaults
agent_safety:
  allow_web_search: false         # Library only by default
  require_approved_web_sources: true
  validation_required_before_polish: true

# Workflow defaults
defaults:
  autonomy_essay: "full"
  autonomy_book: "supervised"
  strict_library_only: true

# Search settings
search:
  default_mode: "hybrid"
  max_results: 100
  default_limit: 10
```

---

## Verification

### Check Installation

```bash
# Verify rdf CLI
./rdf --help
./rdf commands

# Check database connection
./rdf health --format json
```

### Test Search

```bash
# Add a test document
cp /path/to/sample.pdf NEW_DOCS/
./rdf ingest NEW_DOCS/

# Search
./rdf search "test query" --format json
```

### Expected Output

```json
{
  "status": "success",
  "code": "SUCCESS",
  "message": "Found 5 results",
  "data": {
    "results": [...]
  }
}
```

---

## Optional Features

### Semantic Search (OpenAI Embeddings)

Enables meaning-based search beyond keywords.

```bash
# Add API key
echo "OPENAI_API_KEY=sk-your-key" >> .env

# Generate embeddings for existing documents
python3 pipeline/generate_embeddings.py

# Search with semantic mode
./rdf search "your query" --mode semantic
```

**Cost:** ~$0.02 per 1M tokens (one-time per document)

### Web Search (Tavily)

Allows filling research gaps from web sources (requires explicit approval).

```bash
# Add API key
echo "TAVILY_API_KEY=tvly-your-key" >> .env

# Enable in config
# agent_safety.allow_web_search: true

# Use with research
./rdf research "topic" --allow-web
```

### OCR Repair

For documents with poor text extraction:

```bash
# Install OCR tools
pip install pytesseract easyocr Pillow pdf2image
sudo apt install tesseract-ocr tesseract-ocr-eng poppler-utils

# Re-OCR a document
./rdf ingest document.pdf --ocr-profile standard
```

### Advanced RAG Features

```bash
# Cross-encoder re-ranking
pip install sentence-transformers

# Knowledge graph
pip install networkx
```

---

## Troubleshooting

### Database Issues

**"Connection refused"**
```bash
sudo systemctl start postgresql
sudo systemctl status postgresql
```

**"Role does not exist"**
```bash
sudo -u postgres createuser -P research_dev_user
```

**"Database does not exist"**
```bash
sudo -u postgres createdb research_dev_db -O research_dev_user
```

**"Extension vector does not exist"**
```bash
sudo apt install postgresql-16-pgvector
sudo -u postgres psql -d research_dev_db -c "CREATE EXTENSION vector;"
```

### Python Issues

**"No module named 'xxx'"**
```bash
pip install -r requirements.txt
```

**"NLTK punkt not found"**
```bash
python -c "import nltk; nltk.download('punkt'); nltk.download('punkt_tab')"
```

### CLI Issues

**"Permission denied: ./rdf"**
```bash
chmod +x rdf
```

**"Command not found"**
```bash
# Use explicit path
./rdf --help
# Or add to PATH
export PATH=$PATH:/var/www/html/research/Research_development
```

---

## Security Notes

### Protecting Credentials

```bash
# Set restrictive permissions
chmod 600 .env .env.db

# Add to .gitignore
echo ".env" >> .gitignore
echo ".env.db" >> .gitignore
```

### API Key Safety

- Never commit API keys to version control
- Set usage limits in OpenAI dashboard
- Monitor usage at https://platform.openai.com/usage
- Use `--strict` flag to avoid web searches

---

## Next Steps

After setup:

1. **[CLI User Guide](CLI_USER_GUIDE.md)** - Learn all commands
2. **[Feature Map](FEATURE_MAP.md)** - Complete specification
3. **[Canonical Workflows](CANONICAL_WORKFLOWS.md)** - Standard workflows
4. **[Ingestion Guide](INGESTION_GUIDE.md)** - Import documents

---

## Quick Reference

### Start Working

```bash
cd /var/www/html/research/Research_development

# Ensure database is running
sudo systemctl start postgresql

# Add documents
cp ~/documents/*.pdf NEW_DOCS/
./rdf ingest NEW_DOCS/

# Search
./rdf search "your topic"

# Write
./rdf essay "your topic" --pages 10
```

### Check Status

```bash
./rdf status
./rdf health --format json
./rdf queue list
```
