Receipt bot stuck at "Downloading photo…" — diagnosis + download hardening + telegram test suite
Symptom (owner, 2026-06-16)¶
Telegram receipt bot stuck at "📥 Downloading photo…", then eventually "⚠️ Sorry, I couldn't process that receipt after several tries."
Diagnosis — it's the download step, NOT the AI¶
The status never advanced past "Downloading photo…". The first line of the AI
step is await progress('🤖 Reading receipt…') (lib/telegram/receiptProcessor.ts:917),
so if processing had started the status would have changed. It didn't → the
failure is in the Inngest download-photo step (lib/inngest/receipt-processor.ts),
before Gemini/Vertex is touched. (The webhook also pre-flights AI creds, so a
creds problem would have said "AI processing is not configured" up front.)
The failure handler firing ("…after several tries") confirms the Inngest function genuinely failed after retries (not a hang / endpoint-signing issue).
Four possible causes inside download-photo, ranked:
1. Photo too large for Inngest — the step returns the image as base64 through
Inngest step state, which has a per-step output cap. High-res photos, or images
sent as a file/document (uncompressed), blow it. (The file header comment
already flagged this risk.)
2. File > 20 MB — Telegram Bot-API getFile refuses → tgGetFile returns null.
3. Download stalled — fetch had no timeout → silently burned the 60 s budget.
4. Telegram CDN HTTP error.
Shipped (origin/main 3191a865)¶
processReceiptPhotoFailureHandlernow appends the real step error ("Reason: …") from theinngest/function.failedevent, and logs it — so the cause is visible in Telegram + Vercel logs instead of a generic message.download-photohardened: 25 s fetch timeout, byte-size logging, and actionable errors for both size limits (Telegram 20 MB; base64/Inngest cap viaRECEIPT_MAX_IMAGE_BYTES, default 3 MB).
Also shipped — telegram test suite green (origin/main bd40378a)¶
9 pre-existing failures (surfaced after the WOPC fix run):
- driveUploader.test.ts (7) — stale: code is canonical date-first since #615 (for
chronological flat-folder sort); updated the 7 expectations + the stale header
comment. No runtime change.
- receiptProcessor.test.ts (2) — real test bugs: (a) mocked @google-cloud/vertexai
but the processor uses @google/genai → real client JWT-signed the fake key
(ERR_OSSL_UNSUPPORTED); re-mocked @google/genai. (b) regex /service-account key/
didn't match "…service-account credentials…"; fixed + cleared all credential env
sources (incl. FIREBASE_ADMIN_*) for env-robustness.
Full suite 317/317 green; tsc + lint clean.
Pending (to close)¶
- Confirm WHICH of the 4 causes: read the Inngest dashboard (app
aote-pms→process-receipt-photo→ failed run →download-photostep), OR re-send the failing receipt now that the real reason surfaces in Telegram. - If it's the base64/Inngest size cap (likely): the proper fix is to stop passing image bytes through step state — stash to Drive/GCS/temp and pass a reference instead (bigger refactor; the 3 MB guard is a stopgap with an actionable message).
Resolution (2026-06-16, via live Vercel/Inngest logs)¶
- "Stuck at Downloading photo…" + "totally dead" were TRANSIENT — 3 back-to-back prod redeploys (WOPC → bot-fix → tests) made the webhook intermittently cold; Telegram backed off delivery, then auto-recovered once deploys settled. Webhook handler was always healthy (401/secret + 405 probes OK); app up; token+secret present. No config change was needed; the bot resumed on its own.
- Confirmed via logs: a test receipt flowed end-to-end (download → AI → summary → toggle → Confirm). So the download path works; the error-surfacing + 25s timeout + size guard (3191a865) remain as hardening for any future large-image failure.
- NEW bug found in the logs + FIXED: on receipt Confirm,
findDuplicateReceipt(lib/telegram/receiptStore.ts:385) threw FAILED_PRECONDITION — a missing tebs-erl composite index foraccounting/receipts/entrieson(metadata.parsedReceipt.transactionDate ASC, status ASC). The query catches + returns null, so duplicate detection was SILENTLY DISABLED (non-fatal). Fixed: created the index viagcloud firestore indexes composite create(CREATING → READY) AND committed the def to firestore.indexes.json (origin/main ad42eb43).
Log¶
- 2026-06-16 created. Diagnosed (download step), shipped error-surfacing + hardening (3191a865) and telegram test-suite green (bd40378a) to main.
- 2026-06-16 DONE. Tailed prod logs: stuck/dead = transient deploy-churn (self-recovered). Found + fixed the real latent bug — missing tebs-erl dup-detection index (gcloud create + def committed ad42eb43).