Serverless virus scanning with AWS Lambda and Node.js
AWS Lambda is a compelling platform for on-demand file scanning: you pay only for the milliseconds the function runs and it scales automatically. The main challenge is making ClamAV available inside the Lambda execution environment. The cleanest solution is to build a container image Lambda that includes the ClamAV binary and virus database alongside your Node.js code.
Why container images?
Lambda supports two packaging formats: ZIP archives and container images (up to 10 GB). ClamAV — together with its virus database — weighs several hundred megabytes, making ZIP archives impractical. Container images have no effective size constraint for this use case.
The container image approach also gives you a reproducible build: the same
image runs locally with docker run and in Lambda without any
environment differences.
/tmp (512 MB by
default, configurable up to 10 GB). pompelmi writes nothing — it only reads
the file you point it at — so the only constraint is the size of the uploaded
file you write there.
Dockerfile
Build on top of the official AWS Lambda Node.js base image, install ClamAV,
and run freshclam to bake the virus database into the image:
FROM public.ecr.aws/lambda/nodejs:20
# Install ClamAV
RUN dnf install -y clamav clamav-update && \
# Download the virus database at build time
freshclam && \
# Clean up to reduce layer size
dnf clean all
COPY package.json package-lock.json ./
RUN npm ci --omit=dev
COPY index.js ./
CMD ["index.handler"]
Lambda handler
The handler receives an API Gateway event containing a base64-encoded file
body. It writes the decoded bytes to /tmp, scans with pompelmi,
and returns the result:
// index.js
const { scan, Verdict } = require('pompelmi');
const { writeFile, unlink } = require('fs/promises');
const { existsSync } = require('fs');
const { join } = require('path');
const { randomBytes } = require('crypto');
exports.handler = async (event) => {
let tmpPath = null;
try {
// API Gateway v2 (HTTP API) passes the body as base64 when binary
if (!event.body) {
return response(400, { error: 'No body provided.' });
}
const buffer = Buffer.from(event.body, event.isBase64Encoded ? 'base64' : 'utf8');
const filename = event.headers?.['x-filename'] ?? 'upload.bin';
const ext = filename.split('.').pop()?.toLowerCase() ?? 'bin';
tmpPath = join('/tmp', randomBytes(16).toString('hex') + '.' + ext);
await writeFile(tmpPath, buffer);
const verdict = await scan(tmpPath);
if (verdict === Verdict.Malicious) {
return response(400, { error: 'Malware detected. Upload rejected.' });
}
if (verdict === Verdict.ScanError) {
return response(422, { error: 'Scan could not complete. Rejected as precaution.' });
}
// File is clean — you can now upload it to S3
return response(200, {
status: 'ok',
verdict: verdict.description,
});
} catch (err) {
console.error('Scan error:', err);
return response(500, { error: err.message });
} finally {
if (tmpPath && existsSync(tmpPath)) {
await unlink(tmpPath).catch(() => {});
}
}
};
function response(statusCode, body) {
return {
statusCode,
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(body),
};
}
S3 trigger pattern
A common serverless architecture triggers Lambda when a file lands in a
"quarantine" S3 bucket. Lambda downloads the file to /tmp,
scans it, then either moves it to a "clean" bucket or deletes it.
// index.js — S3 trigger variant
const { scan, Verdict } = require('pompelmi');
const { S3Client, GetObjectCommand,
CopyObjectCommand, DeleteObjectCommand } = require('@aws-sdk/client-s3');
const { writeFile, unlink } = require('fs/promises');
const { existsSync, createWriteStream } = require('fs');
const { join } = require('path');
const { pipeline } = require('stream/promises');
const { randomBytes } = require('crypto');
const s3 = new S3Client({});
const CLEAN_BUCKET = process.env.CLEAN_BUCKET;
exports.handler = async (event) => {
for (const record of event.Records) {
const bucket = record.s3.bucket.name;
const key = decodeURIComponent(record.s3.object.key.replace(/\+/g, ' '));
const tmpPath = join('/tmp', randomBytes(16).toString('hex'));
try {
// Download from S3 quarantine bucket to /tmp
const { Body } = await s3.send(new GetObjectCommand({ Bucket: bucket, Key: key }));
await pipeline(Body, createWriteStream(tmpPath));
const verdict = await scan(tmpPath);
if (verdict === Verdict.Clean) {
// Move to clean bucket
await s3.send(new CopyObjectCommand({
Bucket: CLEAN_BUCKET,
Key: key,
CopySource: `${bucket}/${key}`,
Tagging: 'scan-status=clean&scanned-by=pompelmi',
TaggingDirective: 'REPLACE',
}));
}
// Delete from quarantine bucket regardless of verdict
await s3.send(new DeleteObjectCommand({ Bucket: bucket, Key: key }));
console.log(`Key=${key} Verdict=${verdict.description}`);
} finally {
if (existsSync(tmpPath)) await unlink(tmpPath).catch(() => {});
}
}
};
Cold starts and database freshness
Baking the virus database into the container image keeps cold starts fast but means the database ages as time passes. Consider these strategies:
| Strategy | Cold start | DB freshness |
|---|---|---|
| Database baked into image (this guide) | Fast — no network on startup | As fresh as the last image rebuild |
Run freshclam on cold start from /tmp |
Slow — 1–3 min for DB download | Always current |
Cache DB in S3, copy to /tmp on cold start |
Medium — S3 copy (~10 s) | As fresh as your S3 update job |
For most use cases, rebuilding the container image weekly with a scheduled CI job provides a good balance between cold-start speed and database currency.
Next steps
-
Testing locally before deploying to Lambda?
See Running pompelmi with ClamAV in Docker Compose
— the Dockerfile above runs identically with
docker run. - Need a standalone REST API instead of a trigger? Read Building a file scanning REST API with Node.js and pompelmi.
- Scan-then-upload to S3? See Scanning files before uploading to AWS S3.