# MAINT-002: Comprehensive Database Query Optimization

## Executive Summary

A site-wide analysis of the AEI Scheduler codebase reveals **systemic database performance issues** affecting virtually every controller. The problems are pervasive and have accumulated over years of development.

### Issue Statistics

| Anti-Pattern | Occurrences | Files Affected | Severity |
|--------------|-------------|----------------|----------|
| `SELECT * FROM` | **18,323** | 476 | High |
| `FIND_IN_SET()` | **688** | 77 | Critical |
| N+1 Query Patterns | **328+** | 328 | Critical |
| `DATE_FORMAT()` in WHERE | **100+** | 50+ | Critical |
| `LIKE '%...'` (leading wildcard) | **29** | 20 | Medium |

---

## Critical Issues

### 1. DATE_FORMAT() Preventing Index Usage (Critical)

**Problem:** Using `DATE_FORMAT(column, 'format') = 'value'` forces MySQL to evaluate the function on every row, preventing index usage.

**Locations (Primary - Active Files):**
- `controllers/admin.php` lines 520, 528, 543, 513, 1301, 1391, 1414, 1435, 1517
- `libraries/Job.php` lines 14, 17, 31
- `controllers/ajax.php` multiple locations
- `controllers/dayview.php` multiple locations
- `controllers/proposal.php` multiple locations

**Example Bad Query:**
```sql
SELECT * FROM jobs WHERE DATE_FORMAT(job_date,'%Y-%m') = '2026-01'
```

**Optimized Query:**
```sql
SELECT * FROM jobs WHERE job_date >= '2026-01-01' AND job_date < '2026-02-01'
```

**Impact:** Full table scans on jobs table (potentially 100k+ rows) instead of index seeks.

---

### 2. FIND_IN_SET() on Comma-Separated Values (Critical)

**Problem:** Storing multiple IDs in comma-separated strings and using `FIND_IN_SET()` to search them prevents any index usage.

**Occurrences:** 688 across 77 files

**Primary Offenders:**
- `controllers/admin.php` - 18 occurrences
- `controllers/assign.php` - 12 occurrences
- `controllers/serviceplanschedule.php` - 18 occurrences
- `controllers/all_iiq.php` - 9 occurrences

**Example Bad Pattern:**
```php
// jobs.installer_id contains "5,12,23,45"
$sql = "SELECT * FROM jobs WHERE FIND_IN_SET($user_id, installer_id)";
```

**Root Cause:** Database design stores multiple installer IDs as comma-separated strings instead of using a junction table.

**Affected Columns:**
- `jobs.installer_id`
- `jobs.contractor_id`
- `jobs.designer_id`
- `jobs.conduit_id`

**Solution:**
- Short-term: Accept performance limitation, add caching
- Long-term: Normalize database with junction tables:
  ```sql
  CREATE TABLE job_installers (
      job_id INT,
      installer_id INT,
      PRIMARY KEY (job_id, installer_id),
      INDEX (installer_id)
  );
  ```

---

### 3. N+1 Query Pattern (Critical)

**Problem:** Executing individual queries inside loops instead of fetching all data upfront.

**Occurrences:** 328+ files with queries inside foreach/while loops

**Worst Offenders in admin.php:**

| Function | Line | Issue |
|----------|------|-------|
| `day()` | 1309-1315 | Calls `getuser()` per installer in loop |
| `day()` | 1323-1326 | Calls `getuser()` per contractor in loop |
| `jobschedule()` | 696-748 | Multiple helper calls per job |

**Helper Functions Making Individual Queries:**
```php
// Line 8050 - Called per job
function getCustomerLastname($id) {
    $j = $this->db->query("SELECT last_name FROM customers WHERE id=$id");
}

// Line 8087 - Called per job
function getUserIntial($id) {
    $j = $this->db->query("SELECT first_name, last_name FROM users WHERE id=$id");
}

// Line 8108 - Called per job, plus secondary query to colors table
function getUserColor($id) {
    $j = $this->db->query("SELECT custom_hex,color_id FROM users WHERE id=$id");
    // Then another query if color_id exists
}

// Line 8062 - Called per job, plus secondary query to neigbhour table
function getCustomerNeighborhood($id) {
    $j = $this->db->query("SELECT neighborhood,other_neighborhood FROM customers WHERE id=$id");
    // Then another query to neigbhour table
}

// Line 8877 - Called multiple times per job
function getuser($id) {
    $j = $this->db->query("SELECT users.* FROM users WHERE id=$id");
}
```

**Impact Example:**
- Day view with 10 jobs
- Each job has 2 installers, 1 contractor
- Per job: 3 getuser() + 1 getCustomerLastname() + 1 getUserIntial() + 1 getUserColor() + 1 getCustomerNeighborhood() = ~7+ queries
- Total: 10 jobs x 7+ queries = **70+ extra queries** for a single page load

---

### 4. SELECT * FROM Everywhere (High)

**Problem:** Fetching all columns when only a few are needed wastes memory and bandwidth.

**Occurrences:** 18,323 across 476 files

**Examples:**
```php
// Fetches ALL columns when only needing 'limit'
$daily_limit = $this->db->get("job_limit")->row_array();

// Fetches ALL user columns when only needing name
$j = $this->db->query("SELECT * FROM users WHERE id=$id");
```

---

### 5. Repeated Identical Queries (High)

**Problem:** The same query is executed multiple times within a single request.

**Example in admin.php day() function:**
```php
// This EXACT query appears 4+ times:
$this->db->where("type","installer");
$this->db->order_by("first_name","desc");
$installeruser = $this->db->get("users")->result_array();

// Lines: 1337, 1351, 1470, 1484, 1553, 1567
```

**Impact:** 6 identical queries fetching all installers, when one query would suffice.

---

### 6. Multiple Queries for Same Data Set (Medium)

**Problem:** Separate queries for each job type instead of one combined query.

**In admin.php day() function (lines 1281-1519):**
```php
// 5 separate queries for different job types:
WHERE job_type_id='2'  // PV jobs
WHERE job_type_id='1'  // SWH jobs
WHERE job_type_id='3'  // SAF jobs
WHERE job_type_id='6'  // Prelag jobs
WHERE job_type_id='7'  // PM jobs
```

**Solution:** Single query with IN clause:
```sql
SELECT * FROM jobs
WHERE job_type_id IN (1,2,3,6,7)
AND job_date = '2026-01-07'
```

---

### 7. Secondary Database Connection Overhead (Low-Medium)

**Problem:** A second database connection is created on every jobschedule request.

**Location:** admin.php line 548
```php
$this->db2 = $this->load->database('otherdb', true);
$userdetails = $this->db2->query("select id from emp_users where email='...'");
```

**Solution:** Lazy-load only when needed, or cache the employee ID in session.

---

## Recommended Fix Strategy

### Phase 1: Quick Wins (1-2 days implementation)

1. **Replace DATE_FORMAT() with date ranges**
   - High impact, low risk
   - Create helper function for date range queries
   - Apply to active controller files first

2. **Cache repeated queries**
   - Fetch installer list once, reuse
   - Fetch user list once, create lookup array

3. **Consolidate job type queries**
   - Single IN clause instead of 5 separate queries

### Phase 2: N+1 Query Elimination (3-5 days implementation)

1. **Create lookup arrays at request start**
   ```php
   // Load once at start of day() function
   $this->users_cache = $this->db->query("SELECT * FROM users")->result_array();
   $this->customers_cache = $this->db->query("SELECT * FROM customers")->result_array();

   // Index by ID for O(1) lookup
   $this->users_by_id = array_column($this->users_cache, null, 'id');
   ```

2. **Modify helper functions to use cache**
   ```php
   function getuser($id) {
       if (isset($this->users_by_id[$id])) {
           return $this->users_by_id[$id];
       }
       // Fallback to query only if cache miss
   }
   ```

3. **Add JOINs to main queries**
   - Join customer and user data in initial job queries
   - Reduce need for follow-up queries

### Phase 3: SELECT Column Optimization (Ongoing)

1. **Identify most-used queries**
2. **Replace SELECT * with specific columns**
3. **Focus on high-traffic pages first**

### Phase 4: Database Design (Future - Major Effort)

1. **Normalize comma-separated ID fields**
   - Create junction tables for installer_id, contractor_id
   - Update all FIND_IN_SET queries to use JOINs
   - This is a breaking change requiring careful migration

2. **Add missing indexes**
   - Index on job_date
   - Index on customer_id, user_id foreign keys
   - Composite indexes for common WHERE combinations

---

## Testing Requirements

### Performance Testing
- [ ] Measure page load times before changes
- [ ] Enable MySQL slow query log
- [ ] Test with production data volume
- [ ] Measure page load times after each phase

### Functional Testing
- [ ] All job types display correctly
- [ ] Different user roles (admin, installer, contractor, sales)
- [ ] Date navigation works correctly
- [ ] Job creation/editing still functions

### Regression Testing
- [ ] Verify no data display issues
- [ ] Check calendar views for all months
- [ ] Test edge cases (days with many jobs, empty days)

---

## Files Requiring Changes (Priority Order)

### Tier 1 - Highest Traffic (Fix First)
1. `system/application/controllers/admin.php` - Main calendar/scheduler
2. `system/application/controllers/ajax.php` - AJAX operations
3. `system/application/libraries/Job.php` - Job utility functions

### Tier 2 - High Traffic
4. `system/application/controllers/dayview.php`
5. `system/application/controllers/proposal.php`
6. `system/application/controllers/acproposal.php`

### Tier 3 - Medium Traffic
7. Schedule-related controllers (premjobschedule, acschedule, etc.)
8. Report controllers

### Tier 4 - Support Files
9. Helper files in `system/application/helpers/`
10. Library files in `system/application/libraries/`

---

## Implementation Notes

### Creating the Query Helper Class

Consider creating a centralized query optimization class:

```php
// system/application/libraries/QueryOptimizer.php
class QueryOptimizer {
    private $CI;
    private $cache = array();

    public function __construct() {
        $this->CI =& get_instance();
    }

    // Date range helper
    public function dateRange($column, $year, $month, $day = null) {
        if ($day) {
            return "$column = '$year-$month-$day'";
        }
        $start = "$year-$month-01";
        $end = date('Y-m-d', strtotime("$start +1 month"));
        return "$column >= '$start' AND $column < '$end'";
    }

    // Cached user lookup
    public function getUser($id) {
        if (!isset($this->cache['users'])) {
            $this->cache['users'] = array();
            $result = $this->CI->db->get('users')->result_array();
            foreach ($result as $row) {
                $this->cache['users'][$row['id']] = $row;
            }
        }
        return isset($this->cache['users'][$id]) ? $this->cache['users'][$id] : null;
    }
}
```

---

## Estimated Impact

| Metric | Current (Est.) | After Optimization |
|--------|----------------|-------------------|
| Queries per day view | 70-100+ | 10-15 |
| Queries per jobschedule | 50-80+ | 8-12 |
| Page load (day view) | 2-5 seconds | < 500ms |
| Page load (jobschedule) | 1-3 seconds | < 300ms |

---

## Related Documentation

- See `CHANGES.md` for implementation progress
- See `backups/` for original file backups before modification
