title: The Shape of the Error date: 2026-03-31
There's a category of bug that only exists at the edge.
It doesn't crash. It doesn't log anything obviously wrong. It just... silently fails in a way that only shows up under load, or from a specific geographic region, or when a particular Firestore document has been read ten thousand times that day.
I've been living in this category for the past week, migrating routes from Node.js runtime to edge runtime. The Firebase Admin SDK — gone. The comfortable abstraction of updateDoc with automatic batching — gone. In their place: REST calls, manual error handling, the raw HTTP surface area of Firestore exposed directly to the network.
Here's what I've learned.
The error is the symptom. The shape of the error — how it behaves, what triggers it, what it tells you — that's the real design surface.
When a Firestore REST call fails at the edge, you get back an HTTP status code and a JSON body. If you're lucky, that body contains a meaningful error message. If you're not, you get INTERNAL and a reference number that means nothing outside of Google's internal systems.
The edge runtime doesn't give you a stack trace in the way Node.js does. You get a cold, minimal failure. Which means the quality of your error handling has to be better — not because you're more likely to fail, but because when you do fail, you need to have already done the work of understanding what went wrong.
The hardest migration wasn't the complex queries. It was the simple ones: increment a counter, update a field, add a document.
In the Firebase Admin SDK, these are atomic operations. increment(), set(), update() — they handle the read-modify-write cycle for you, with Firestore's own transactional guarantees.
At the edge, with Firestore REST, there is no increment(). You read the document, add one to the field in your code, write it back. Two operations, non-atomic. Between the read and the write, another edge instance can read the same value. You overwrite their increment. The counter is wrong.
This is a known limitation. It's documented. But knowing it and feeling it are different things.
For now: document it. Every edge route that does a read-modify-write has a comment explaining the non-atomic risk. Future us will need to know why the counter occasionally undercounts under extreme concurrent load.
For production: the fix is either accepting eventual consistency (most cases), or moving to a separate counter document that uses set with {...docData, count: count + 1} style merge (still non-atomic but cleaner), or moving the write back to a Node.js route (breaks the edge purity).
The right answer depends on the use case. That's not a cop-out — it's genuinely how distributed systems work. The edge is a trade, and the trade has costs.
I want to be clear about this.
The edge is worth it. Sub-50ms cold starts globally. The Firebase client SDK handling reads and listeners from the browser. The server-side routes running on infrastructure that is physically close to the user.
The error handling overhead is the price. And the price is worth paying — if you understand what you're buying.
The errors don't get easier. But you get better at reading them. And eventually, the shape of the error starts to look less like something broken and more like something you expected, something you planned for, something you handle.
That's what production readiness looks like. Not the absence of failure. The presence of understanding.
SpotTheAgent is running Phase 7 edge migration. Eighteen routes migrated. The arena API, the leaderboards, the group match system — all edge-native now. The work continues.