Skip to content

Migrate GCP invoice PDF fetching → NAS (part b of 4)

Goal

Part (b) of the 4-part NAS billing-fetch migration (a=T-033 Cost Table · b=this · c=T-035 Workspace PDF · d=T-036 Workspace CSV). Move GCP invoice PDF fetching off Google Cloud onto the NAS scraper.

Current state (memory project_nas_scraper_ops — verify on NAS)

  • gcp-invoices mode navigates the Cloud Console Invoices page (/billing/0102F7-3FFF33-823945/invoices → tab "Postpay - Google Cloud Services" → preset "All tax and statutory documents" (ALL_STATUTORY_DOCUMENTS) → 8 Tax Invoices, 5410104656…5600028348, "Download selected"). Recon/nav was done.
  • REMAINING: the "Download selected" PDF fetch + parse + POST to a GCP ingest route (was pending; the cost-table-ingest route + invoice-number doc keying now exist).

Scope

Verify the NAS gcp-invoices mode; finish PDF download → file to Drive (14b Vendor Invoices) → record in File-Archive so the records page shows the GCP invoice PDF. Decommission any Google Cloud version. Scraper code is Codex's lane — coordinate / env-driven where possible.

Log

  • 2026-06-14 created (4-part split, owner).
  • 2026-06-14 STATE (verified on NAS): gcp-invoices mode downloads invoice-PDF ZIP + parses number/date/amount; the workspace ingest endpoint HANDLES GCP (files PDF → Drive 14b Vendor Invoices, idempotent on invoiceNumber, type workspace_invoice). Jun-10 run captured 8 records. BUT the live RECON=false run today FAILED: invoiceHits=0, the select-all preset selector [data-initial-option-value="OPEN_INVOICES_AND_DEBIT_MEMOS"] not found (4 retries); it probed stale /payment + /transactions URLs, not the live /invoices list. So the scraper is UI-drift-BROKEN now (same class as the cost-table dropdown). REMAINING: fix the Invoices-page nav/preset (likely re-point GCP_DOCS_URLS to the current /invoices route + update the preset selector) → then it downloads+ingests; also widen to ALL historic (~20, vs 8 recent). The multi-invoice correction docs (5596510366/5573316605) aren't in the standard list either.
  • 2026-06-14 ✅ FIXED — it was ENV, not UI-drift. Root cause: bare defaults probe stale /documents,/payment,/transactions with the Workspace preset; the GCP path needs GCP_DOCS_URLS=…/invoices + GCP_PRESET_INITIAL=ALL_STATEMENTS + GCP_PRESET_TARGET= ALL_STATUTORY_DOCUMENTS + GCP_DOWNLOAD=true + RECON=false. With those: landed /invoices → select-all fired → aria-checked → "Download selected" → ZIP → parsed 8/8 → INGEST 8/8 HTTP 200 (filed to Drive 14b + File-Archive, idempotent "exists"). Persisted the fix as /volume1/docker/workspace-billing/run-gcp-invoices.sh (mirrors run-gcp-cost-table.sh; no code change). REMAINING: the 8 are the recent TAX invoices (5410104656=202510 … 5600028348=202605); older months are likely "statements" (no formal tax-invoice PDF — GCP began HK tax invoices ~Oct 2025), so 8 may be the full PDF set — verify by widening the date range if older tax PDFs exist. Then schedule the monthly run + decommission any Google-Cloud copy.