Problem
Large JSON responses can look safe in local testing and still fail in production. The confusing version happens when the client receives small compressed responses for some requests, then the same endpoint fails with a 502 Bad Gateway or 413 Payload Too Large for slightly larger inputs.
The trap is the boundary between AWS Lambda and API Gateway. Lambda has a response payload limit. API Gateway can compress responses, but that compression happens after Lambda has already returned the payload. If the raw response is over the Lambda limit, API Gateway never gets a chance to shrink it.
Mechanism
HTTP compression is negotiated with headers. A client sends Accept-Encoding to say what it supports, such as gzip, deflate, or br. The server answers with Content-Encoding when it actually compresses the response.
Many clients make this invisible. Python requests and Postman both send compression headers by default, then decompress the response for you. That is useful, but it can hide where compression is happening.
Client: Accept-Encoding: gzip, deflate, br
App: returns raw JSON unless app-level compression is enabled
Lambda: enforces raw response payload limit
API Gateway: compresses only after Lambda returns successfully
Client: receives compressed bytes and decompresses automaticallyThis creates a split behavior. If the raw payload is under the Lambda limit, Lambda returns it, API Gateway compresses it, and the client sees a small response. If the raw payload is over the limit, Lambda fails before gateway compression can run.
Fix
The durable fix is to avoid returning oversized raw payloads from Lambda. For large result sets, paginate the endpoint or return a cursor so each response stays safely below the raw limit.
If the response is compressible JSON and still needs to be returned directly, compress earlier in the application before Lambda hands the response back. In a Flask service, that can be done with middleware such as flask-compress.
from flask import Flask
from flask_compress import Compress
app = Flask(__name__)
Compress(app)
@app.get("/api/results")
def results():
return {"items": build_large_json_response()}Gateway compression can still be useful, but do not rely on it to protect Lambda from raw response size limits. It is too late in the path.
What changed in practice
On a representative JSON response around 2.65 MB raw, compression changed both transfer size and latency. The exact numbers will vary by payload shape and network path, but the direction is the useful part.
| Setup | Request | Latency | Transferred |
|---|---|---|---|
| Gateway compression only | No Accept-Encoding | 2.7s | 2.65 MB |
| Gateway compression only | With Accept-Encoding | 2.0s | 330 KB |
| App plus gateway compression | No Accept-Encoding | 2.7s | 2.65 MB |
| App plus gateway compression | With Accept-Encoding | 1.8s | 281 KB |
The important distinction is not just 330 KB versus 281 KB. It is where the compression occurs. Compression inside the application can reduce the payload before it crosses the Lambda boundary. Compression at the gateway cannot rescue a response Lambda already rejected.
Production lesson
When a serverless endpoint returns large JSON, measure both raw bytes and transferred bytes. Browser tools, Postman, and HTTP clients often show the compressed transfer size, which can make a response look smaller than the payload your runtime actually had to return.
A good checklist is simple: know the raw payload size, know where compression is applied, keep Lambda responses below the raw limit, paginate large results, and use app-level compression when returning compressible data directly.