GCP Billing Statements — scrape (NAS) + surface in unified feed for pre-Oct-2025 months
Goal (owner, 2026-06-19)¶
GCP issued billing statements (not formal HK tax invoices) for billing periods
2024-10 → 2025-09 (13 months). GCP only began issuing HK tax invoices around
Oct 2025, which is what T-034 finished wiring up — the NAS scraper now pulls the
8 tax invoices (5410104656 = 2025-10 … 5600028348 = 2026-05) and files them
to Drive 14b. Vendor Invoices as gcp_invoice File-Archive docs, joined to
cost-table rows in the Records page via T-044's resolveVendorPdfUrl.
For the 13 historical months before Oct 2025, no tax invoice exists — but GCP Console does expose a downloadable statement PDF for each. Today the NAS scraper doesn't fetch them and the web app has no representation for them, so those 13 rows in the Records → Receipts: Service Inv sub-tab show a disabled PDF button.
Scope¶
Close the gap end-to-end so every GCP row in the Records feed has a clickable PDF: scrape the statements on the NAS, ingest them as a new File-Archive doc type, fall back to the statement when no tax invoice exists for that month, and keep the existing tax-invoice path untouched.
Coordinating both sides (NAS + web app) — not handed to Codex.
What we know (verified)¶
- GCP Console nav (T-034 final note): the NAS scraper lands on
/billing/0102F7-3FFF33-823945/invoices→ tab "Postpay - Google Cloud Services". Two presets are available on the page:ALL_STATEMENTS(default landing) → all monthly statements (covers every billing period since the account opened, ~21 months as of writing).ALL_STATUTORY_DOCUMENTS("All tax and statutory documents") → the 8 tax-invoice PDFs (Oct 2025+ only). The currentrun-gcp-invoices.shswitches fromALL_STATEMENTStoALL_STATUTORY_DOCUMENTSthen downloads — that's why only the 8 land today.
- Ingest route
pages/api/workspace/billing/ingest.ts: - Discriminator
vendor: 'workspace' | 'gcp'mapping toVENDOR_DOC_TYPE{ workspace: 'workspace_invoice', gcp: 'gcp_invoice' }. - Idempotent on
(subsidiaryId, type, invoiceNumber). - Stores PDF on Drive
14b. Vendor Invoicesand writes a File-Archive doc. - Web-app surface
lib/accounting/vendorInvoiceFeed.server.ts(Phase 2): buildGcpRows()joins each cost-table row to itsgcp_invoicedoc by invoice number → exposespdfUrl;nullwhen no tax invoice exists.lib/accounting/receipts.ts > listGcpInvoices()filterstype === 'gcp_invoice'— no statement type today.- Doc-type registry
lib/accounting/types.ts:2584:DocumentType = 'receipt' | 'invoice_pdf' | 'workspace_invoice' | 'gcp_invoice' | 'vendor_invoice' | 'contract' | 'quote' | 'wopc' | 'other'.
Sub-tasks¶
T-053a — NAS scraper: second pass for statements¶
Add run-gcp-statements.sh (mirrors run-gcp-invoices.sh) on the NAS at
/volume1/docker/workspace-billing/. Env-only, no scraper code change:
GCP_DOCS_URLS=https://console.cloud.google.com/billing/0102F7-3FFF33-823945/invoices
GCP_PRESET_INITIAL=ALL_STATEMENTS
# do NOT switch to ALL_STATUTORY_DOCUMENTS — stay on statements
GCP_PRESET_TARGET=ALL_STATEMENTS
GCP_DOWNLOAD=true
RECON=false
# NEW — discriminator the scraper POSTs to the ingest route
INGEST_VENDOR=gcp-statement
Schedule: monthly cron, the day after the statement period closes. Run once manually now to backfill the 13 historical months.
Open Qs to resolve before running: does the scraper currently hard-code
vendor: 'gcp' in the ingest POST, or is it env-driven? If hard-coded, the
ingest route extension below must remain backward-compatible (statements
ride on a NEW vendor discriminator; existing GCP invoice ingest is
unaffected).
T-053b — Ingest route: new doc type¶
pages/api/workspace/billing/ingest.ts:
- Extend Vendor union to 'workspace' | 'gcp' | 'gcp-statement'.
- Extend VENDOR_DOC_TYPE map: 'gcp-statement': 'gcp_statement'.
- Extend VENDOR_LABEL: 'gcp-statement': 'Google Cloud (statement)'.
- No other logic change — same idempotency key, same Drive folder
(14b. Vendor Invoices), same auth path.
lib/accounting/types.ts:
- Add 'gcp_statement' to DocumentType.
- Add to the VENDOR_DOC_TYPES array (line ~2617) if it gates UI filters.
T-053c — Web app: surface statements as a fallback PDF on GCP rows¶
lib/accounting/receipts.ts:
- New listGcpStatements() mirroring listGcpInvoices() but filtering
type === 'gcp_statement'. Statement number convention TBD by the
scraper output — likely a YYYYMM string lifted from the filename
(the statement carries no Google-issued "invoice number").
lib/accounting/vendorInvoiceFeed.server.ts > buildGcpRows():
- Join cost-table rows to statements by invoice month (YYYYMM) when
no tax-invoice match exists for that month.
- pdfUrl falls back to the statement's /api/accounting/receipts/{id}?
redirect=pdf URL.
- New optional field on VendorInvoiceFeedRow: pdfKind: 'tax_invoice' |
'statement' | null so the UI can tag the row ("Statement" badge on
pre-Oct-2025 months) without changing how pdfUrl is consumed.
T-053d — UI surface (small)¶
components/records/ExpenseRecordsTab.tsx (or its VendorInvoicesTab
equivalent depending on which Records UI is live):
- When a Service Invoice row has pdfKind === 'statement', render a small
Statement tag next to the file link so auditors aren't confused why the
PDF doesn't look like a tax invoice.
Out of scope on purpose¶
- Workspace statements — Workspace invoices are formal documents from day one; no statement gap there.
- Auto-classifying statements vs tax invoices on the client. The NAS
knows which preset it scraped from, so the client never has to guess —
gcp_invoiceandgcp_statementare physically distinct doc types. - GL behavior. Statements are documentary support, not tax-deductible invoices. No change to GL postings or the cost-table-driven amounts.
Success criteria¶
- All 13 historical GCP months (2024-10 → 2025-09) expose a PDF link in the Records → Receipts → Service Inv view.
- New tax invoices in Oct 2025+ continue to use the tax-invoice doc; no regression on the existing 8 months.
- Monthly NAS run files BOTH (statement for the just-closed period if no tax invoice yet, plus any newly available tax invoice).
- File-Archive collection is idempotent across re-runs.
Relates to¶
- T-034 — Migrate GCP invoice PDF fetching → NAS (the original 4-part split; this is functionally a "part b.2" follow-on for statements).
- T-044 — Records page joins GCP cost-table rows to PDFs (the join helper this task extends with a statement fallback).
- Phase 2 unified expense-records feed
(
lib/accounting/vendorInvoiceFeed.server.ts).
Log¶
- 2026-06-19 created (owner). Discovered the gap while reviewing the merged Records Service Invoices sub-tab: pre-Oct-2025 GCP rows have no PDF link even though Console exposes a downloadable statement. T-034's closing note flagged the verification step ("widen the date range if older tax PDFs exist") but stopped short of the statement-side scrape. Spec'd end-to-end before starting implementation.