Skip to content

Receipt bot stuck at "Downloading photo…" — diagnosis + download hardening + telegram test suite

Symptom (owner, 2026-06-16)

Telegram receipt bot stuck at "📥 Downloading photo…", then eventually "⚠️ Sorry, I couldn't process that receipt after several tries."

Diagnosis — it's the download step, NOT the AI

The status never advanced past "Downloading photo…". The first line of the AI step is await progress('🤖 Reading receipt…') (lib/telegram/receiptProcessor.ts:917), so if processing had started the status would have changed. It didn't → the failure is in the Inngest download-photo step (lib/inngest/receipt-processor.ts), before Gemini/Vertex is touched. (The webhook also pre-flights AI creds, so a creds problem would have said "AI processing is not configured" up front.)

The failure handler firing ("…after several tries") confirms the Inngest function genuinely failed after retries (not a hang / endpoint-signing issue).

Four possible causes inside download-photo, ranked: 1. Photo too large for Inngest — the step returns the image as base64 through Inngest step state, which has a per-step output cap. High-res photos, or images sent as a file/document (uncompressed), blow it. (The file header comment already flagged this risk.) 2. File > 20 MB — Telegram Bot-API getFile refuses → tgGetFile returns null. 3. Download stalledfetch had no timeout → silently burned the 60 s budget. 4. Telegram CDN HTTP error.

Shipped (origin/main 3191a865)

  • processReceiptPhotoFailureHandler now appends the real step error ("Reason: …") from the inngest/function.failed event, and logs it — so the cause is visible in Telegram + Vercel logs instead of a generic message.
  • download-photo hardened: 25 s fetch timeout, byte-size logging, and actionable errors for both size limits (Telegram 20 MB; base64/Inngest cap via RECEIPT_MAX_IMAGE_BYTES, default 3 MB).

Also shipped — telegram test suite green (origin/main bd40378a)

9 pre-existing failures (surfaced after the WOPC fix run): - driveUploader.test.ts (7) — stale: code is canonical date-first since #615 (for chronological flat-folder sort); updated the 7 expectations + the stale header comment. No runtime change. - receiptProcessor.test.ts (2) — real test bugs: (a) mocked @google-cloud/vertexai but the processor uses @google/genai → real client JWT-signed the fake key (ERR_OSSL_UNSUPPORTED); re-mocked @google/genai. (b) regex /service-account key/ didn't match "…service-account credentials…"; fixed + cleared all credential env sources (incl. FIREBASE_ADMIN_*) for env-robustness. Full suite 317/317 green; tsc + lint clean.

Pending (to close)

  • Confirm WHICH of the 4 causes: read the Inngest dashboard (app aote-pmsprocess-receipt-photo → failed run → download-photo step), OR re-send the failing receipt now that the real reason surfaces in Telegram.
  • If it's the base64/Inngest size cap (likely): the proper fix is to stop passing image bytes through step state — stash to Drive/GCS/temp and pass a reference instead (bigger refactor; the 3 MB guard is a stopgap with an actionable message).

Resolution (2026-06-16, via live Vercel/Inngest logs)

  • "Stuck at Downloading photo…" + "totally dead" were TRANSIENT — 3 back-to-back prod redeploys (WOPC → bot-fix → tests) made the webhook intermittently cold; Telegram backed off delivery, then auto-recovered once deploys settled. Webhook handler was always healthy (401/secret + 405 probes OK); app up; token+secret present. No config change was needed; the bot resumed on its own.
  • Confirmed via logs: a test receipt flowed end-to-end (download → AI → summary → toggle → Confirm). So the download path works; the error-surfacing + 25s timeout + size guard (3191a865) remain as hardening for any future large-image failure.
  • NEW bug found in the logs + FIXED: on receipt Confirm, findDuplicateReceipt (lib/telegram/receiptStore.ts:385) threw FAILED_PRECONDITION — a missing tebs-erl composite index for accounting/receipts/entries on (metadata.parsedReceipt.transactionDate ASC, status ASC). The query catches + returns null, so duplicate detection was SILENTLY DISABLED (non-fatal). Fixed: created the index via gcloud firestore indexes composite create (CREATING → READY) AND committed the def to firestore.indexes.json (origin/main ad42eb43).

Log

  • 2026-06-16 created. Diagnosed (download step), shipped error-surfacing + hardening (3191a865) and telegram test-suite green (bd40378a) to main.
  • 2026-06-16 DONE. Tailed prod logs: stuck/dead = transient deploy-churn (self-recovered). Found + fixed the real latent bug — missing tebs-erl dup-detection index (gcloud create + def committed ad42eb43).