# AEI System Operational Audit

Automated health check for the AEI Hawaii Scheduler system across three servers:
AWS (production), local WSL2, and aei-webserv2 (upload.aeihawaii.com).

## Quick Start

```bash
# Full audit (all servers)
python3 /var/www/html/AEI_REMOTE/audit/system_audit.py

# Remote server only
python3 /var/www/html/AEI_REMOTE/audit/system_audit.py --remote-only

# Local server only (includes aei-webserv2)
python3 /var/www/html/AEI_REMOTE/audit/system_audit.py --local-only

# JSON output (for automation)
python3 /var/www/html/AEI_REMOTE/audit/system_audit.py --json

# Only show warnings and failures
python3 /var/www/html/AEI_REMOTE/audit/system_audit.py --warn-only
```

**Prerequisites:** `pip3 install requests` and SSH key at `/root/.ssh/aei_production.pem`

**Exit codes:** 0 = all pass/warn, 1 = any FAIL

---

## What Each Check Means

### Phase 1 — Remote Services (1-5)

Core processes on the AWS server. If any are down, the scheduler is degraded or offline.

| # | Check | Why | On Failure |
|---|-------|-----|------------|
| 1 | Apache httpd | Web server for scheduler + photo API | `sudo service httpd start` |
| 2 | MySQL mysqld | Database for all scheduler data | `sudo service mysqld start` |
| 3 | crond | Runs scheduled jobs (cron, retry queue) | `sudo service crond start` |
| 4 | Postfix | Email relay through Mailtrap | `sudo service postfix start` |
| 5 | Python 3.6 | WebP generation, sync scripts | Should always exist — investigate if missing |

### Phase 2 — Remote Cron Health (6-11)

Verifies cron jobs ran on schedule by checking `/var/log/cron`.

| # | Check | Expected | On Failure |
|---|-------|----------|------------|
| 6 | scheduler/cron | Every 5 min | Check `crontab -l` for ec2-user; verify crond running |
| 7 | process_retry_queue.py | Every 15 min | Check ec2-user crontab; verify queue/ dir exists |
| 8 | ticketcron | Every 5 min | WARN expected — osTicket DB is broken (known issue) |
| 9 | PDF cleanup | In Julian's crontab | Non-critical; /tmp cleanup |
| 10 | Stale failed queue items | None >7 days | Review `queue/failed/`, fix or purge |
| 11 | Retry queue size | ≤10 items | If >10: local server may be down or unreachable |

### Phase 3 — Remote Disk & Storage (12-15)

Disk space and log file sizes. The remote disk is 985GB total.

| # | Check | Threshold | On Failure |
|---|-------|-----------|------------|
| 12 | Root disk free | >50GB (WARN >20GB) | See `DOCS/REMOTE_SERVER_STORAGE_AUDIT.md` for cleanup targets |
| 13 | Maillog size | <1GB | `sudo truncate -s 0 /var/log/maillog` or investigate mail loop |
| 14 | Uploads size | Informational | ~275GB expected; no action needed |
| 15 | /tmp free | >1GB | Clear stale PDF/tmp files |

### Phase 4 — Remote Security (16-19)

Known security exposures identified in the storage audit.

| # | Check | Why | On Failure |
|---|-------|-----|------------|
| 16 | sl.sql removed | 290MB SQL dump, publicly accessible | `rm /var/www/vhosts/aeihawaii.com/httpdocs/photoapi/sl.sql` |
| 17 | mysqldbne/ removed | Old DB admin tool, publicly accessible | `rm -rf /var/www/vhosts/aeihawaii.com/httpdocs/mysqldbne/` |
| 18 | SSL cert >30 days | HTTPS for scheduler + API | Cert is Let's Encrypt via certbot; check auto-renewal |
| 19 | PHP extensions | Required for scheduler operation | Investigate — extensions shouldn't disappear |

### Phase 5 — Remote Photo API (20-25)

HTTP checks against the photo API endpoints. No actual uploads or deletes.

| # | Check | What It Does | On Failure |
|---|-------|-------------|------------|
| 20 | upload.php JSON | POSTs invalid request, expects JSON error | Check Apache error log, PHP syntax |
| 21 | getimagelisting.php | POSTs with job_id=0, expects JSON | Check file exists and PHP is parsing |
| 22 | fetch_image.php | GETs nonexistent file, expects 200/404 | Check file exists |
| 23 | delete.php auth | POSTs bad token, expects rejection | Should never accept bad auth |
| 24 | delete_image.php auth | POSTs bad token to batch delete endpoint | Should never accept bad auth |
| 25 | delete_local_photo.php auth | GETs bad token to local delete endpoint | Should never accept bad auth |

### Phase 6 — Remote Database (26-28)

Direct MySQL queries via SSH.

| # | Check | Why | On Failure |
|---|-------|-----|------------|
| 26 | SELECT 1 | Basic connectivity | Check MySQL running; verify credentials |
| 27 | meter_files | Photo metadata table | Table may be locked or corrupted |
| 28 | jobs | Core scheduling table | Critical — scheduler non-functional without it |

### Phase 7 — Local Services (29-33)

Core processes on the local WSL2 server.

| # | Check | Why | On Failure |
|---|-------|-----|------------|
| 29 | Apache | Serves upload.aeihawaii.com | `sudo service apache2 start` |
| 30 | MariaDB | Local database server | `sudo service mariadb start` |
| 31 | fail2ban | Security — 22 jails | `sudo service fail2ban start` |
| 32 | cron | Local scheduled jobs | `sudo service cron start` |
| 33 | /mnt/dropbox/ | Photo storage mount | Check Windows Dropbox is running; mount may need refresh |

### Phase 8 — Local Firewall & Security (34-38)

Critical for cross-server sync. If the AWS IP is blocked, photo sync breaks silently.

| # | Check | Why | On Failure |
|---|-------|-----|------------|
| 34 | AWS IP in ipset | Required for sync_to_local.py | `sudo ipset add trusted_whitelist 18.225.0.90` |
| 35 | AWS IP not banned | fail2ban can ban the AWS IP | `sudo fail2ban-client set <jail> unbanip 18.225.0.90` |
| 36 | Jail count = ~22 | All jails loaded | `sudo fail2ban-client reload` |
| 37 | Local HTTP responds | upload.aeihawaii.com accessible | Check Apache, SSL, DNS |
| 38 | Local SSL >30 days | HTTPS for local server | `sudo certbot renew` |

### Phase 9 — Local Cron Health (39-42)

| # | Check | Expected | On Failure |
|---|-------|----------|------------|
| 39 | daily_database_sync | Within 24h | Check map_dropbox cron; verify script runs |
| 40 | process_queue.sh | Running or recent | Check /var/www/html/security/ cron |
| 41 | DB backup <24h | Daily SQL backup | Check backup cron; verify /var/www/SQL_backups/ |
| 42 | Disk free >50GB | Local storage | Clean old backups, logs, temp files |

### Phase 10 — Cross-System Connectivity (43-44)

End-to-end connectivity between servers.

| # | Check | What It Does | On Failure |
|---|-------|-------------|------------|
| 43 | Scheduler login | GET to login.php | Apache/PHP issue on remote |
| 44 | Photo API JSON | POST to upload.php, expect JSON | API endpoint issue |

### Phase 11 — aei-webserv2 Health (45-51)

Checks on aei-webserv2 (192.168.141.219 / upload.aeihawaii.com) — the local server handling photo uploads, `Schedular` DB, and Dropbox mount.

| # | Check | What It Does | On Failure |
|---|-------|-------------|------------|
| 45 | Apache running | `pgrep -c apache2` via SSH | `sudo service apache2 start` on webserv2 |
| 46 | MySQL running | `pgrep -c mysqld` via SSH | `sudo service mariadb start` on webserv2 |
| 47 | uploadlocallat_kuldeep.php | File exists check | Endpoint missing — redeploy |
| 48 | delete_local_photo.php | File exists check | Endpoint missing — redeploy |
| 49 | check_photos.php | File exists check | Endpoint missing — redeploy |
| 50 | local_photos table | `SELECT COUNT(*)` via SSH | Check MySQL; verify `Schedular` DB and `upload_user` access |
| 51 | /mnt/dropbox accessible | `ls /mnt/dropbox/` via SSH | Check Dropbox mount on webserv2 |

---

## Manual Deep-Dive Commands

### Remote Server (via SSH)

```bash
# SSH to remote
ssh -i /root/.ssh/aei_production.pem Julian@18.225.0.90

# Check service status
sudo service httpd status
sudo service mysqld status
sudo service crond status
sudo service postfix status

# Check cron logs (last 20 entries)
sudo tail -20 /var/log/cron

# Check specific cron job
sudo grep 'process_retry_queue' /var/log/cron | tail -5

# Check Apache error log
sudo tail -50 /var/log/httpd/error_log

# Check PHP errors
sudo tail -50 /var/log/httpd/aeihawaii.com-error_log

# Check queue
ls -la /var/www/vhosts/aeihawaii.com/httpdocs/photoapi/queue/
ls -la /var/www/vhosts/aeihawaii.com/httpdocs/photoapi/queue/failed/

# Check disk usage by directory
sudo du -sh /var/www/vhosts/aeihawaii.com/httpdocs/scheduler/uploads/
sudo du -sh /mnt/dropbox/
sudo du -sh /var/log/

# Check SSL cert
echo | openssl s_client -servername aeihawaii.com -connect aeihawaii.com:443 2>/dev/null | openssl x509 -noout -dates

# MySQL check
mysql -u schedular -p'M1gif9!6' mandhdesign_schedular -e "SELECT COUNT(*) FROM meter_files;"
```

### Local Server

```bash
# Service status
sudo service apache2 status
sudo service mariadb status
sudo service fail2ban status

# Firewall checks
sudo ipset list trusted_whitelist | grep 18.225.0.90
sudo fail2ban-client status
sudo fail2ban-client status badactor | grep 18.225.0.90

# Unban AWS IP (if needed)
sudo fail2ban-client set badactor unbanip 18.225.0.90
sudo fail2ban-client set apache-404 unbanip 18.225.0.90

# Add AWS IP to whitelist (if needed)
sudo ipset add trusted_whitelist 18.225.0.90

# Check backups
ls -lt /var/www/SQL_backups/ | head -5

# Check disk
df -h /
```

### aei-webserv2 (192.168.141.219)

```bash
# SSH to webserv2
ssh -p 55222 aeiuser@192.168.141.219

# Check services
sudo service apache2 status
sudo service mariadb status

# Check upload endpoints
ls -la /var/www/html/upload/uploadlocallat_kuldeep.php
ls -la /var/www/html/upload/delete_local_photo.php
ls -la /var/www/html/upload/check_photos.php

# Check Schedular DB
mysql -u upload_user -p'P@55w02d778899' Schedular -e "SELECT COUNT(*) FROM unified_customers;"
mysql -u upload_user -p'P@55w02d778899' Schedular -e "SELECT COUNT(*) FROM local_photos;"

# Check Dropbox mount
ls /mnt/dropbox/ | head -5
```

---

## Common Failure Scenarios

### 1. fail2ban Bans AWS IP (Photo Sync Breaks)

**Symptoms:** Check 35 FAIL, photos stop syncing to local, queue grows on remote.

**Fix:**
```bash
# Find which jail banned it
for jail in $(sudo fail2ban-client status | grep 'Jail list' | sed 's/.*://;s/ //g' | tr ',' ' '); do
  sudo fail2ban-client status $jail 2>/dev/null | grep -q 18.225.0.90 && echo "BANNED in $jail"
done

# Unban from all jails
sudo fail2ban-client unban 18.225.0.90

# Permanent fix: add to ignoreip
echo "18.225.0.90" | sudo tee -a /etc/fail2ban/jail.d/00-ignoreip.conf
sudo fail2ban-client reload
```

### 2. ipset Whitelist Lost on Reboot

**Symptoms:** Check 34 FAIL. Happens after WSL2/Windows restart.

**Fix:**
```bash
sudo ipset add trusted_whitelist 18.225.0.90
# Verify
sudo ipset test trusted_whitelist 18.225.0.90
```

**Prevention:** The whitelist is persisted in `/var/www/html/security/whitelist.json` and should be restored on boot. If it isn't, check the boot scripts.

### 3. Maillog Growing Unbounded

**Symptoms:** Check 13 WARN (>1GB).

**Fix:**
```bash
# On remote
sudo truncate -s 0 /var/log/maillog
# Investigate source — usually osTicket cron errors
sudo grep 'status=bounced\|status=deferred' /var/log/maillog | tail -20
```

### 4. SSL Certificate Expiring

**Symptoms:** Check 18 or 38 WARN (<30 days).

**Fix:**
```bash
# Remote (Plesk manages certs)
# Check Plesk: https://18.225.0.90:8443

# Local
sudo certbot renew --dry-run
sudo certbot renew
```

### 5. Retry Queue Growing

**Symptoms:** Check 11 WARN (>10 items).

**Root cause:** Usually the local server is unreachable (fail2ban ban, ipset missing, or server down).

**Fix:** Resolve the connectivity issue first (checks 34-35), then the queue will drain on next cron run.

### 6. osTicket Cron Errors

**Symptoms:** Check 8 WARN. This is a **known issue** — the osTicket database user (`support1@localhost`) has access denied.

**Impact:** osTicket email polling is broken. Low priority unless ticket system is needed.

### 7. aei-webserv2 SSH Fails

**Symptoms:** Checks 45-51 all FAIL with "SSH to 192.168.141.219 failed".

**Fix:**
```bash
# Test SSH connectivity
ssh -p 55222 aeiuser@192.168.141.219 "echo OK"

# If host is unreachable, check:
# 1. Is the machine running? (physical or VM)
# 2. Is SSH service running on port 55222?
# 3. Is the local network reachable?
```

---

## Scheduling the Audit

### Daily (6 AM, email on failure)

```bash
# Add to root's crontab
0 6 * * * /usr/bin/python3 /var/www/html/AEI_REMOTE/audit/system_audit.py --warn-only 2>&1 | mail -s "AEI Audit $(date +\%F)" admin@aeihawaii.com
```

### Weekly (Sunday, full JSON log)

```bash
0 7 * * 0 /usr/bin/python3 /var/www/html/AEI_REMOTE/audit/system_audit.py --json > /var/log/aei_audit_$(date +\%F).json 2>&1
```

### On-Demand (after maintenance)

```bash
python3 /var/www/html/AEI_REMOTE/audit/system_audit.py
```

---

## Related Files

- **Photo pipeline QA:** `AEI_PHOTO_API_PROJECT/QA/test_upload_pipeline.py` (59 upload-specific tests)
- **Cron inventory:** `audit/CRON_INVENTORY.md` (all cron jobs + 12 anomalies)
- **Credentials audit:** `audit/CREDENTIALS_AUDIT.md` (all credentials + expiry risks)
- **Storage audit:** `AEI_PHOTO_API_PROJECT/DOCS/REMOTE_SERVER_STORAGE_AUDIT.md`
- **Server reference:** `REMOTE_SERVER.md`
