Skip to content

GCP Billing Statements — scrape (NAS) + surface in unified feed for pre-Oct-2025 months

Goal (owner, 2026-06-19)

GCP issued billing statements (not formal HK tax invoices) for billing periods 2024-10 → 2025-09 (13 months). GCP only began issuing HK tax invoices around Oct 2025, which is what T-034 finished wiring up — the NAS scraper now pulls the 8 tax invoices (5410104656 = 2025-10 … 5600028348 = 2026-05) and files them to Drive 14b. Vendor Invoices as gcp_invoice File-Archive docs, joined to cost-table rows in the Records page via T-044's resolveVendorPdfUrl.

For the 13 historical months before Oct 2025, no tax invoice exists — but GCP Console does expose a downloadable statement PDF for each. Today the NAS scraper doesn't fetch them and the web app has no representation for them, so those 13 rows in the Records → Receipts: Service Inv sub-tab show a disabled PDF button.

Scope

Close the gap end-to-end so every GCP row in the Records feed has a clickable PDF: scrape the statements on the NAS, ingest them as a new File-Archive doc type, fall back to the statement when no tax invoice exists for that month, and keep the existing tax-invoice path untouched.

Coordinating both sides (NAS + web app) — not handed to Codex.

What we know (verified)

  • GCP Console nav (T-034 final note): the NAS scraper lands on /billing/0102F7-3FFF33-823945/invoices → tab "Postpay - Google Cloud Services". Two presets are available on the page:
    • ALL_STATEMENTS (default landing) → all monthly statements (covers every billing period since the account opened, ~21 months as of writing).
    • ALL_STATUTORY_DOCUMENTS ("All tax and statutory documents") → the 8 tax-invoice PDFs (Oct 2025+ only). The current run-gcp-invoices.sh switches from ALL_STATEMENTS to ALL_STATUTORY_DOCUMENTS then downloads — that's why only the 8 land today.
  • Ingest route pages/api/workspace/billing/ingest.ts:
  • Discriminator vendor: 'workspace' | 'gcp' mapping to VENDOR_DOC_TYPE { workspace: 'workspace_invoice', gcp: 'gcp_invoice' }.
  • Idempotent on (subsidiaryId, type, invoiceNumber).
  • Stores PDF on Drive 14b. Vendor Invoices and writes a File-Archive doc.
  • Web-app surface lib/accounting/vendorInvoiceFeed.server.ts (Phase 2):
  • buildGcpRows() joins each cost-table row to its gcp_invoice doc by invoice number → exposes pdfUrl; null when no tax invoice exists.
  • lib/accounting/receipts.ts > listGcpInvoices() filters type === 'gcp_invoice' — no statement type today.
  • Doc-type registry lib/accounting/types.ts:2584: DocumentType = 'receipt' | 'invoice_pdf' | 'workspace_invoice' | 'gcp_invoice' | 'vendor_invoice' | 'contract' | 'quote' | 'wopc' | 'other'.

Sub-tasks

T-053a — NAS scraper: second pass for statements

Add run-gcp-statements.sh (mirrors run-gcp-invoices.sh) on the NAS at /volume1/docker/workspace-billing/. Env-only, no scraper code change:

GCP_DOCS_URLS=https://console.cloud.google.com/billing/0102F7-3FFF33-823945/invoices
GCP_PRESET_INITIAL=ALL_STATEMENTS
# do NOT switch to ALL_STATUTORY_DOCUMENTS — stay on statements
GCP_PRESET_TARGET=ALL_STATEMENTS
GCP_DOWNLOAD=true
RECON=false
# NEW — discriminator the scraper POSTs to the ingest route
INGEST_VENDOR=gcp-statement

Schedule: monthly cron, the day after the statement period closes. Run once manually now to backfill the 13 historical months.

Open Qs to resolve before running: does the scraper currently hard-code vendor: 'gcp' in the ingest POST, or is it env-driven? If hard-coded, the ingest route extension below must remain backward-compatible (statements ride on a NEW vendor discriminator; existing GCP invoice ingest is unaffected).

T-053b — Ingest route: new doc type

pages/api/workspace/billing/ingest.ts: - Extend Vendor union to 'workspace' | 'gcp' | 'gcp-statement'. - Extend VENDOR_DOC_TYPE map: 'gcp-statement': 'gcp_statement'. - Extend VENDOR_LABEL: 'gcp-statement': 'Google Cloud (statement)'. - No other logic change — same idempotency key, same Drive folder (14b. Vendor Invoices), same auth path.

lib/accounting/types.ts: - Add 'gcp_statement' to DocumentType. - Add to the VENDOR_DOC_TYPES array (line ~2617) if it gates UI filters.

T-053c — Web app: surface statements as a fallback PDF on GCP rows

lib/accounting/receipts.ts: - New listGcpStatements() mirroring listGcpInvoices() but filtering type === 'gcp_statement'. Statement number convention TBD by the scraper output — likely a YYYYMM string lifted from the filename (the statement carries no Google-issued "invoice number").

lib/accounting/vendorInvoiceFeed.server.ts > buildGcpRows(): - Join cost-table rows to statements by invoice month (YYYYMM) when no tax-invoice match exists for that month. - pdfUrl falls back to the statement's /api/accounting/receipts/{id}? redirect=pdf URL. - New optional field on VendorInvoiceFeedRow: pdfKind: 'tax_invoice' | 'statement' | null so the UI can tag the row ("Statement" badge on pre-Oct-2025 months) without changing how pdfUrl is consumed.

T-053d — UI surface (small)

components/records/ExpenseRecordsTab.tsx (or its VendorInvoicesTab equivalent depending on which Records UI is live): - When a Service Invoice row has pdfKind === 'statement', render a small Statement tag next to the file link so auditors aren't confused why the PDF doesn't look like a tax invoice.

Out of scope on purpose

  • Workspace statements — Workspace invoices are formal documents from day one; no statement gap there.
  • Auto-classifying statements vs tax invoices on the client. The NAS knows which preset it scraped from, so the client never has to guess — gcp_invoice and gcp_statement are physically distinct doc types.
  • GL behavior. Statements are documentary support, not tax-deductible invoices. No change to GL postings or the cost-table-driven amounts.

Success criteria

  • All 13 historical GCP months (2024-10 → 2025-09) expose a PDF link in the Records → Receipts → Service Inv view.
  • New tax invoices in Oct 2025+ continue to use the tax-invoice doc; no regression on the existing 8 months.
  • Monthly NAS run files BOTH (statement for the just-closed period if no tax invoice yet, plus any newly available tax invoice).
  • File-Archive collection is idempotent across re-runs.

Relates to

  • T-034 — Migrate GCP invoice PDF fetching → NAS (the original 4-part split; this is functionally a "part b.2" follow-on for statements).
  • T-044 — Records page joins GCP cost-table rows to PDFs (the join helper this task extends with a statement fallback).
  • Phase 2 unified expense-records feed (lib/accounting/vendorInvoiceFeed.server.ts).

Log

  • 2026-06-19 created (owner). Discovered the gap while reviewing the merged Records Service Invoices sub-tab: pre-Oct-2025 GCP rows have no PDF link even though Console exposes a downloadable statement. T-034's closing note flagged the verification step ("widen the date range if older tax PDFs exist") but stopped short of the statement-side scrape. Spec'd end-to-end before starting implementation.