# Local LLM Setup Guide

Guide for running local LLMs for embeddings in the Research Development Framework.

> **Note:** Claude Code serves as the reasoning engine. Local LLMs are primarily used for **embeddings** (optional alternative to OpenAI embeddings).

---

## Table of Contents

1. [Overview](#overview)
2. [Hardware Requirements](#hardware-requirements)
3. [Ollama (Recommended)](#ollama-recommended)
4. [Local Embeddings Setup](#local-embeddings-setup)
5. [Configuration](#configuration)
6. [Troubleshooting](#troubleshooting)

---

## Overview

Local LLMs can provide embeddings without cloud API costs:

| Benefit | Description |
|---------|-------------|
| **Privacy** | Documents never leave your machine |
| **Cost** | Free after initial setup |
| **Offline** | Works without internet |

### Architecture

```
┌─────────────────────────────────────────────────────────────────┐
│                        ARCHITECTURE                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│   Claude Code (Reasoning)       Local/Cloud Embeddings          │
│   ┌─────────────────┐          ┌─────────────────┐              │
│   │                 │          │                 │              │
│   │  Planning       │          │  OpenAI API     │              │
│   │  Judgment       │          │  OR             │              │
│   │  Synthesis      │          │  Ollama (local) │              │
│   │                 │          │                 │              │
│   └─────────────────┘          └─────────────────┘              │
│          │                            │                          │
│          ▼                            ▼                          │
│   ┌─────────────────────────────────────────────────────────┐   │
│   │                    RDF CLI                               │   │
│   │   rdf ingest → embeddings → rdf search                   │   │
│   └─────────────────────────────────────────────────────────┘   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘
```

---

## Hardware Requirements

### For Local Embeddings

| Component | Minimum | Recommended |
|-----------|---------|-------------|
| RAM | 4 GB | 8+ GB |
| Storage | 5 GB free | 10+ GB free |
| GPU | None (CPU works) | Optional |

Embedding models are smaller than chat models and run efficiently on CPU.

### Recommended Embedding Models

| Model | Size | Quality | Speed |
|-------|------|---------|-------|
| `nomic-embed-text` | 274M | Excellent | Fast |
| `mxbai-embed-large` | 335M | Very Good | Fast |
| `all-minilm` | 23M | Good | Very Fast |

---

## Ollama (Recommended)

Ollama is the easiest way to run local embedding models.

### Installation

#### Linux

```bash
curl -fsSL https://ollama.com/install.sh | sh
ollama --version
```

#### macOS

```bash
brew install ollama
ollama serve
```

#### Windows

Download from https://ollama.ai/download and run the installer.

### Download Embedding Model

```bash
# Recommended embedding model
ollama pull nomic-embed-text

# Or smaller/faster option
ollama pull all-minilm

# List models
ollama list
```

### Start Server

```bash
# Start Ollama (runs on port 11434)
ollama serve

# Verify
curl http://localhost:11434/api/tags
```

---

## Local Embeddings Setup

### Option 1: Ollama Embeddings

```bash
# 1. Pull embedding model
ollama pull nomic-embed-text

# 2. Test embeddings API
curl http://localhost:11434/api/embeddings \
  -d '{"model": "nomic-embed-text", "prompt": "Hello world"}'
```

### Option 2: Sentence Transformers (Python)

```bash
# Install
pip install sentence-transformers

# Test
python -c "from sentence_transformers import SentenceTransformer; m = SentenceTransformer('all-MiniLM-L6-v2'); print(m.encode('test').shape)"
```

---

## Configuration

### Using OpenAI Embeddings (Default)

In `.env`:
```bash
OPENAI_API_KEY=sk-your-key
EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_DIMENSIONS=1536
```

### Using Local Embeddings (Ollama)

In `.env`:
```bash
# Leave OPENAI_API_KEY unset or empty
EMBEDDING_PROVIDER=local
LOCAL_EMBEDDING_MODEL=nomic-embed-text
LOCAL_EMBEDDING_ENDPOINT=http://localhost:11434
```

In `config/project.yaml`:
```yaml
embeddings:
  provider: "local"  # or "openai"
  model: "nomic-embed-text"
  endpoint: "http://localhost:11434"
  dimension: 768  # Model-specific
```

### Verify Configuration

```bash
cd /var/www/html/research/Research_development

# Check health
./rdf health --format json
```

---

## Embedding Dimension Reference

| Model | Provider | Dimensions |
|-------|----------|------------|
| text-embedding-3-small | OpenAI | 1536 |
| text-embedding-3-large | OpenAI | 3072 |
| nomic-embed-text | Ollama | 768 |
| mxbai-embed-large | Ollama | 1024 |
| all-minilm | Ollama | 384 |

**Important:** If changing embedding models, you must regenerate embeddings for existing documents:

```bash
python3 pipeline/generate_embeddings.py --force
```

---

## Troubleshooting

### "Connection refused" Error

```bash
# Check if Ollama is running
curl http://localhost:11434/api/tags

# Start Ollama
ollama serve
```

### "Model not found" Error

```bash
# List available models
ollama list

# Pull the model
ollama pull nomic-embed-text
```

### Slow Embedding Generation

1. Embedding models are lightweight - if slow, check system resources
2. Consider using GPU acceleration if available
3. Use batch processing in generate_embeddings.py

### Dimension Mismatch

If you see errors about vector dimensions:

```sql
-- Check current embedding dimensions
SELECT array_length(embedding, 1) FROM chunks LIMIT 1;
```

Ensure your model's dimensions match the database schema. If changing models, regenerate all embeddings.

---

## Quick Start

```bash
# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# 2. Pull embedding model
ollama pull nomic-embed-text

# 3. Start server
ollama serve &

# 4. Configure (edit config/project.yaml)
# embeddings:
#   provider: "local"
#   model: "nomic-embed-text"

# 5. Generate embeddings
python3 pipeline/generate_embeddings.py
```

---

## See Also

- [Setup Guide](SETUP.md) - Main installation
- [CLI User Guide](CLI_USER_GUIDE.md) - Command reference
- [Ingestion Guide](INGESTION_GUIDE.md) - Document import
