# DPP Permit Sync - Implementation Roadmap

**Approach:** Playwright headless browser automation (the only viable method)
**Run from:** Local WSL2 machine (Python 3.12 + Playwright 1.55 + Chromium already installed)
**Target DB:** Remote production MySQL (18.225.0.90) via direct connection
**Created:** 2026-02-24

## Why Playwright (Not API)

All Salesforce REST/UI APIs return `API_DISABLED_FOR_ORG` for the permits@aeihawaii.com
account (see API_TEST_RESULTS.md). The only way to access permit data is through
rendered portal pages inside a browser session. Playwright provides:

- Headless Chromium -- no GUI needed for cron
- Full JavaScript execution (Salesforce pages require JS)
- Cookie/session management
- DOM querying for data extraction
- Already installed locally (`playwright 1.55`, Chromium confirmed working)

## Architecture

```
LOCAL MACHINE (WSL2)                      REMOTE (18.225.0.90)
┌────────────────────────┐                ┌─────────────────┐
│  dpp_permit_sync.py    │                │  MySQL           │
│  ┌──────────────────┐  │                │  mandhdesign_    │
│  │ Playwright        │──── HTTPS ────▶  │  schedular       │
│  │ (headless Chrome) │  │  (SF portal)  │                  │
│  └──────────────────┘  │                │  ┌─────────────┐ │
│          │              │                │  │ permit_sync  │ │
│          ▼              │                │  │ permit_sync_ │ │
│  Parse rendered pages   │── MySQL ──────▶│  │ _log         │ │
│  Extract permit data    │  (direct)      │  │ jobs (update)│ │
│          │              │                │  └─────────────┘ │
│          ▼              │                └─────────────────┘
│  Log + Notify           │
└────────────────────────┘

Cron triggers dpp_permit_sync.py on schedule (see Phase 4)
```

## Phase 1: Database Schema

**Effort:** ~1 hour
**Tables:** Create on remote production DB

Use the schema from `PERMIT_TRACKING_PLAN.md` (5 tables):
- `permit_sync` -- main permit data (status, phase, dates, fees, CO)
- `permit_sync_inspections` -- inspection records (FK to permit_sync)
- `permit_sync_fees` -- fee line items (FK to permit_sync)
- `permit_sync_reviews` -- plan review records (FK to permit_sync)
- `permit_sync_log` -- sync action log (login, fetch, error events)

No schema changes needed -- the existing SQL in PERMIT_TRACKING_PLAN.md is ready to run.

## Phase 2: Core Sync Script

**Location:** `/var/www/html/AEI_REMOTE/scripts/dpp_permit_sync.py`
**Effort:** 2-3 days

### Step 1: Login

```python
# Navigate to login page
page.goto("https://honolulu.my.site.com/s/login/")

# Fill credentials
page.fill('input[name="username"]', 'permits@aeihawaii.com')
page.fill('input[name="password"]', 'p455word')

# Click login, wait for redirect
page.click('button:has-text("Log in")')
page.wait_for_url("**/s/")  # Home page

# Verify sid cookie exists
cookies = context.cookies()
sid = next(c for c in cookies if c['name'] == 'sid')
```

### Step 2: Harvest Permit List from Shared Permits

```python
# Navigate to My Permits → Shared Permits tab
page.goto("https://honolulu.my.site.com/s/my-permits")
page.click('text=Shared Permits')  # Switch to Shared tab

# Paginate and collect all permit links
# Format: /s/permit2/{sf_record_id}/{slug}
# Extract: sf_record_id, permit_number (from slug)
permits = []
while True:
    links = page.query_selector_all('a[href*="/s/permit2/"]')
    for link in links:
        href = link.get_attribute('href')
        # Parse: /s/permit2/a1Bcx000001QYezEAG/bp2026984
        parts = href.split('/')
        sf_id = parts[3]     # a1Bcx000001QYezEAG
        slug = parts[4]      # bp2026984
        permits.append({'sf_id': sf_id, 'slug': slug})
    # Check for next page button, break if none
    ...
```

### Step 3: Match to AEI Jobs

```python
# Query local DB for active jobs with building permits
# SELECT id, building_permit FROM jobs
#   WHERE building_permit > 0
#   AND building_permit_status NOT IN ('completed', 'revoke')

# Match by stripping SF prefix: BP-2026-984 → 984
# Compare against jobs.building_permit
```

**AEI DB stats (from investigation):**
- 14,440 total jobs with building permit numbers
- 11,513 not completed
- 6,142 unique non-completed permits
- 413 shared permits visible in SF portal

Expected overlap: The 413 SF permits will match a subset of the 6,142 AEI permits.

### Step 4: Fetch Detail for Each Matched Permit

```python
# Navigate to permit detail page
page.goto(f"https://honolulu.my.site.com/s/permit2/{sf_id}/{slug}")
page.wait_for_load_state("networkidle")

# Extract data from rendered page using DOM queries
# Use page.locator() or page.query_selector() to find:
#   - Phase (Review, Inspection, etc.)
#   - Status (In Progress, Complete, etc.)
#   - Permit type
#   - Issue date, expiration date
#   - Fee totals
#   - Address
#   - CO status

# Rate limit: 2-second delay between page loads (configurable)
```

**Data extraction strategy:** Use `page.locator()` with accessible names or
`page.snapshot()` to get structured content. The Salesforce portal renders data
in labeled sections that can be parsed from the accessibility tree.

### Step 5: Update Database

```python
# Upsert into permit_sync (keyed on sf_record_id)
# Update jobs table:
#   - building_permit_status when SF status changes
#   - building_permit_completeiondate when issue_date found
# Log all actions to permit_sync_log
```

### Step 6: Error Handling

| Scenario | Action |
|----------|--------|
| Session timeout | Detect redirect to login page, re-authenticate |
| Page load failure | Log error, skip permit, continue to next |
| Element not found | Log warning, store partial data |
| DB connection lost | Retry 3x with backoff, then abort |
| Rate limiting | Configurable delay (default 2s between pages) |
| Browser crash | Restart Playwright context, re-login |

## Phase 3: Initial Permit ID Mapping

**Effort:** Run once after Phase 2 script is working

The Shared Permits page gives us 413+ SF record IDs paired with permit slugs.
The Phase 2 script harvests this mapping automatically. No manual work needed.

For permits NOT in the Shared Permits list, two options:
1. Ask DPP to share more permits with the permits@ account
2. Use the portal search bar (if available) to look up by permit number

## Phase 4: Cron Setup

**Effort:** ~30 minutes
**Location:** Local WSL2 crontab

```cron
# Business hours sync: every 4 hours Mon-Fri
0 6,10,14,18 * * 1-5  /usr/bin/python3 /var/www/html/AEI_REMOTE/scripts/dpp_permit_sync.py 2>&1 >> /var/log/dpp_permit_sync.log

# Full weekly sync (including completed permits): Sunday 2am
0 2 * * 0  /usr/bin/python3 /var/www/html/AEI_REMOTE/scripts/dpp_permit_sync.py --full 2>&1 >> /var/log/dpp_permit_sync.log
```

**Flags:**
- (default) -- sync only non-completed permits
- `--full` -- sync all permits including completed ones
- `--dry-run` -- log what would happen without DB writes
- `--permit 189630` -- sync a single permit (for testing)

## Phase 5: UI Integration (Future)

After sync is running reliably:

1. **Permit tab enhancement** -- Add "DPP Status" section to existing job permit tab
   showing phase, status, dates, fees, last synced timestamp, and portal link
2. **Dashboard widget** -- Permit status overview table with filters
   (All / In Review / Inspection / Completed / Has Balance)
3. **Email notifications** -- Trigger on phase change, status change,
   new inspection, fee due, approaching expiration

## Implementation Priority

| Phase | Task | Effort | Depends On |
|-------|------|--------|------------|
| **1** | Create permit_sync tables | 1 hour | Nothing |
| **2** | Build Python sync script | 2-3 days | Phase 1 |
| **3** | Run initial permit mapping | 1 hour | Phase 2 |
| **4** | Set up cron | 30 min | Phase 2 |
| **5a** | Permit tab UI | 1-2 days | Phase 2 |
| **5b** | Dashboard widget | 1-2 days | Phase 2 |
| **5c** | Email notifications | 1 day | Phase 2 |

**MVP (Phases 1-4):** ~3 days to get automated sync running.

## Technical Prerequisites (All Confirmed Available)

| Requirement | Status |
|-------------|--------|
| Python 3.12 | Installed locally |
| Playwright 1.55 | Installed with Chromium |
| MySQL client (local) | Available (`mysql` CLI) |
| Remote DB access | Confirmed (AEI_User@18.225.0.90) |
| Cron (WSL2) | Working |
| permits@aeihawaii.com account | Tested, 413 shared permits |

## Risks and Mitigations

| Risk | Impact | Mitigation |
|------|--------|------------|
| Salesforce UI changes break DOM selectors | Sync fails | Use accessibility-based selectors (more stable than CSS); alert on parse failures |
| Account locked/password changed | Sync stops | Monitor login failures; store credentials in config file |
| DPP adds CAPTCHA to login | Sync blocked | Unlikely for existing community accounts; would need manual intervention |
| Session rate limiting | Slow sync | 2s delay between pages; spread across 4 daily windows |
| Permits not shared with account | Missing data | Request DPP to share additional permits |
| WSL2 machine offline | Sync pauses | Acceptable -- permit data isn't real-time critical |

## Open Questions (Carried Forward)

1. Can DPP share ALL AEI permits with the permits@ account? (Currently 413 of ~6,142)
2. Does AEI file permits under different applicant names requiring multiple logins?
3. Are electrical permits in the same SF object (MUSW__Permit2__c)?
4. Should we backfill historical data from the old permit_details table?
5. Is there a need for two-way sync (pushing data back to Salesforce)?
