File uploads are the most under-engineered surface in most SaaS apps. The feature ships in a sprint, it works, and then eighteen months later someone uploads a crafted PDF that runs JavaScript in a preview iframe, or a 20GB ZIP bomb that eats the disk, or a JPEG with an attacker's PHP webshell hidden in the EXIF. The fix is never one thing — it's a pipeline: reject early on size and type, sniff the actual bytes, sanitize the content, scan for malware, and store on infrastructure that can't execute whatever slips through. Here's the pipeline we wire up on every client project, plus the failure modes that trip up teams doing most of it right.
Why the Content-Type header is a lie
Every file upload tutorial starts with checking req.file.mimetype. That check is worthless on its own. The Content-Type header is supplied by the client, and clients can send whatever string they want. An attacker uploading shell.php will cheerfully tell your server it is image/jpeg, and multer will cheerfully believe them. Real validation means reading the first few bytes of the file and matching them against the signature of the type you expect.
The Content-Type on an incoming upload is user input. Treat it like any other user input — validate, never trust. Every serious bypass of file-upload filters in the last decade traces back to a server that took the client at its word.
// Validator that sniffs magic bytes, caps size, and rejects early
import { fileTypeFromBuffer } from "file-type";
const ALLOWED = new Map<string, { mime: string; maxBytes: number }>([
["image/jpeg", { mime: "image/jpeg", maxBytes: 10 * 1024 * 1024 }],
["image/png", { mime: "image/png", maxBytes: 10 * 1024 * 1024 }],
["image/webp", { mime: "image/webp", maxBytes: 10 * 1024 * 1024 }],
["application/pdf", { mime: "application/pdf", maxBytes: 25 * 1024 * 1024 }],
]);
export async function validateUpload(
head: Buffer,
totalBytes: number,
declaredType: string,
) {
if (totalBytes <= 0) throw new Error("empty file");
const rule = ALLOWED.get(declaredType);
if (!rule) throw new Error(`type not allowed: ${declaredType}`);
if (totalBytes > rule.maxBytes) throw new Error("file too large");
// file-type reads magic bytes; this is the source of truth, not the header.
const detected = await fileTypeFromBuffer(head);
if (!detected || detected.mime !== rule.mime) {
throw new Error(`content mismatch: claimed ${rule.mime}, got ${detected?.mime ?? "unknown"}`);
}
return detected;
}Two details matter here. First, the buffer passed to fileTypeFromBuffer should be the first 4,100 bytes or so — enough for any signature to resolve. Second, the size check runs before the sniff, which means a 20GB upload pretending to be a PNG gets rejected at the proxy before the sniff buffer is even allocated. Most upload DoS vectors die at a correctly configured body size limit.
Image sanitization — Sharp, and why re-encoding matters
Even a file that genuinely is a PNG can ship a payload. EXIF metadata can hide scripts and GPS coordinates, color profiles can contain malformed data that crashes downstream parsers, and polyglot files can be valid as two formats simultaneously. The defense is to never store user uploads raw — always re-encode through a library that drops everything you don't need. Sharp is the Node.js default for a reason.
import sharp from "sharp";
// Re-encode the image, drop metadata, cap dimensions.
export async function sanitizeImage(input: Buffer): Promise<Buffer> {
return sharp(input, { failOn: "error", limitInputPixels: 20_000_000 })
.rotate() // honor EXIF orientation, then drop it
.resize({ width: 4096, height: 4096, fit: "inside", withoutEnlargement: true })
.toFormat("webp", { quality: 85 })
.toBuffer();
// Sharp drops EXIF/ICC/XMP by default unless withMetadata() is called.
}limitInputPixels is the setting most teams miss. Without it, a pixel-bomb PNG — a small file on disk that decodes to a gigapixel image — will happily allocate gigabytes of RAM and take down the worker. Cap it. Also, re-encoding to WebP or a fresh JPEG isn't just about file size; it's a defensive measure. A crafted input that exploits a parser in one format almost never survives a round-trip to a different format.
Virus scanning — ClamAV for everything, VirusTotal for edge cases
For any upload that will be downloaded or shared by other users — attachments, exports, shared assets — a virus scan is table stakes. The question is: which scanner, run where, and with what backpressure when it fails?
| Option | Cost | Latency | Detection rate | Best for |
|---|---|---|---|---|
| ClamAV (self-hosted) | ~$20/mo compute | 200-800ms per file | Moderate, signature-based | Baseline scan on every upload |
| VirusTotal API | $0.10+ per scan | 1-30s (multi-engine) | High, 70+ engines | Flagged files, shared public links |
| AWS GuardDuty S3 Malware | $1-2 per GB scanned | Async via S3 event | Managed, broad coverage | AWS-native shops, compliance |
| Commercial CDR (OPSWAT, Votiro) | $0.01-0.10 per file | 1-5s | High + reconstruction | Regulated industries, Office/PDF heavy |
For most SaaS apps, the right starting point is ClamAV running as a sidecar process behind a small queue. Scans run asynchronously — the user's upload succeeds immediately, the file is quarantined to a private bucket, and only after a clean scan does the file become visible in the product. This sequence is important: synchronous scanning blocks the request thread and turns a flaky ClamAV into a flaky upload button.
Run ClamAV in its own container and talk to it over a socket. The clamd daemon keeps the virus database in memory — scans drop from seconds to under a second. clamscan, which reloads the database per invocation, is fine for cron jobs but unusable for per-upload scanning at any real volume.
Storage — presigned URLs, and why you're probably using them wrong
Uploading through your API server means every byte passes through your infrastructure. For a product with thousands of users uploading photos, that's a large, unnecessary cost — and the bandwidth bill is the least of it. Your API server becomes a bottleneck, your request timeouts start failing on slow clients, and your pods scale on upload traffic that should never have touched them. Presigned URLs fix this: the client uploads directly to S3 (or R2, or GCS), and your server only signs the permission slip.
import { S3Client } from "@aws-sdk/client-s3";
import { createPresignedPost } from "@aws-sdk/s3-presigned-post";
const s3 = new S3Client({ region: "us-east-1" });
export async function presignUpload(userId: string, contentType: string) {
const key = `uploads/${userId}/${crypto.randomUUID()}`;
return createPresignedPost(s3, {
Bucket: "my-app-uploads-quarantine",
Key: key,
Conditions: [
["content-length-range", 1, 10 * 1024 * 1024], // 1 byte to 10 MB
["eq", "$Content-Type", contentType],
["starts-with", "$key", `uploads/${userId}/`],
],
Fields: { "Content-Type": contentType },
Expires: 300, // 5 minutes
});
}Three things in that presigned post matter more than the rest. Expires should be short — five minutes is plenty for a user upload. The content-length-range condition is what S3 uses to reject oversized files; without it, a 20GB upload to your bucket is your problem to explain to finance. And the starts-with condition on the key path prevents a user from uploading into another user's directory by manipulating the key client-side.
Storage option comparison
| Storage | Egress cost | Best-in-class for | Watch out for |
|---|---|---|---|
| AWS S3 | ~$0.09/GB | Ecosystem integration (Lambda, CloudFront, GuardDuty) | Egress charges compound quickly at scale |
| Cloudflare R2 | $0 egress | Public assets, heavy-read workloads | Smaller ecosystem of event integrations |
| Backblaze B2 | ~$0.01/GB (free tier via CDN partners) | Backups, cold storage, budget-sensitive products | Latency higher than hyperscalers |
| Supabase Storage | Included in plan | Quick setup, tight Supabase/Postgres integration | Tied to your Supabase pricing tier |
| GCS | ~$0.12/GB | GCP-native shops, AI/ML pipelines | Signed URL syntax differs from S3 |
R2 has eaten a lot of S3's market share for public-asset workloads specifically because egress is free — if your product is a design tool, a video platform, or anything else where users download more than they upload, the math is hard to argue with. For private-by-default workloads with heavy analytics or Lambda wiring, S3 is still the easier choice. Mix and match: we've shipped products that upload to R2 for CDN-served assets and mirror sensitive files to S3 for GuardDuty scanning.
Access control — the quarantine pattern
The pattern that keeps production clean is a two-bucket setup: an upload bucket the client can write to via presigned URLs, and a serve bucket files only reach after they've been validated and scanned. An S3 event triggers a Lambda (or a job in your queue) that runs sanitization, runs ClamAV, and on success copies to the serve bucket; on failure, it deletes or moves to a forensic bucket for review. Neither bucket is public; both are served through signed URLs that include user-scoped authorization checks in your app before they're minted.
- Serve files through your app's routing, not directly. Even for public-feeling assets, a thin redirect handler lets you enforce auth, rate-limit hotlinking, and audit downloads.
- Set Content-Disposition: attachment for any file type you don't fully trust the browser to render safely. A PDF rendered inline can run JavaScript; downloaded, it's inert until the user opens it.
- Store uploads on a different domain or subdomain from your main app (uploads.example.com, not example.com/uploads). This scopes cookie-based auth away from the asset origin and defangs most stored-XSS attempts.
Never serve user-uploaded HTML or SVG directly under your main app's domain. SVG is an XML document that can embed JavaScript; HTML is, well, HTML. If you need user avatars that happen to be SVG, re-rasterize them to PNG during sanitization and serve the raster.
Rate limiting and quotas — the boring but essential part
Every file upload endpoint needs three limits layered on top of each other: per-request size (enforced by the proxy and the presigned condition), per-user rate (number of uploads per minute), and per-user quota (total storage consumed). The last one is what saves you when an account is compromised — without a quota, a single stolen credential can fill your bucket with terabytes of whatever the attacker wants to host. With a quota, the attacker hits the ceiling after a few hundred megabytes and triggers the alert.
Key takeaways
- Never trust the Content-Type header. Sniff magic bytes on the server and reject mismatches loudly.
- Always re-encode user-supplied images through Sharp with limitInputPixels set. Raw uploads are an attacker's playground.
- Run ClamAV asynchronously behind a quarantine bucket. Synchronous scanning turns a flaky scanner into a flaky product.
- Use presigned URLs with content-length-range, Content-Type, and key-prefix conditions. The client uploads directly; your server signs a narrowly scoped permission slip.
- Serve from a separate origin, set Content-Disposition: attachment on anything you wouldn't render yourself, and enforce per-user quotas to bound the damage from account compromise.