Serverless virus scanning with AWS Lambda and Node.js

AWS Lambda is a compelling platform for on-demand file scanning: you pay only for the milliseconds the function runs and it scales automatically. The main challenge is making ClamAV available inside the Lambda execution environment. The cleanest solution is to build a container image Lambda that includes the ClamAV binary and virus database alongside your Node.js code.

Why container images?

Lambda supports two packaging formats: ZIP archives and container images (up to 10 GB). ClamAV — together with its virus database — weighs several hundred megabytes, making ZIP archives impractical. Container images have no effective size constraint for this use case.

The container image approach also gives you a reproducible build: the same image runs locally with docker run and in Lambda without any environment differences.

Lambda's writable filesystem is limited to /tmp (512 MB by default, configurable up to 10 GB). pompelmi writes nothing — it only reads the file you point it at — so the only constraint is the size of the uploaded file you write there.

Dockerfile

Build on top of the official AWS Lambda Node.js base image, install ClamAV, and run freshclam to bake the virus database into the image:

FROM public.ecr.aws/lambda/nodejs:20

# Install ClamAV
RUN dnf install -y clamav clamav-update && \
    # Download the virus database at build time
    freshclam && \
    # Clean up to reduce layer size
    dnf clean all

COPY package.json package-lock.json ./
RUN npm ci --omit=dev

COPY index.js ./

CMD ["index.handler"]

Baking the virus database into the image means no network calls on startup — cold starts are as fast as possible. The trade-off is a larger image. Use a scheduled ECR image rebuild (e.g. weekly) to keep the database current.

Lambda handler

The handler receives an API Gateway event containing a base64-encoded file body. It writes the decoded bytes to /tmp, scans with pompelmi, and returns the result:

// index.js
const { scan, Verdict } = require('pompelmi');
const { writeFile, unlink } = require('fs/promises');
const { existsSync } = require('fs');
const { join } = require('path');
const { randomBytes } = require('crypto');

exports.handler = async (event) => {
  let tmpPath = null;

  try {
    // API Gateway v2 (HTTP API) passes the body as base64 when binary
    if (!event.body) {
      return response(400, { error: 'No body provided.' });
    }

    const buffer   = Buffer.from(event.body, event.isBase64Encoded ? 'base64' : 'utf8');
    const filename = event.headers?.['x-filename'] ?? 'upload.bin';
    const ext      = filename.split('.').pop()?.toLowerCase() ?? 'bin';
    tmpPath        = join('/tmp', randomBytes(16).toString('hex') + '.' + ext);

    await writeFile(tmpPath, buffer);

    const verdict = await scan(tmpPath);

    if (verdict === Verdict.Malicious) {
      return response(400, { error: 'Malware detected. Upload rejected.' });
    }
    if (verdict === Verdict.ScanError) {
      return response(422, { error: 'Scan could not complete. Rejected as precaution.' });
    }

    // File is clean — you can now upload it to S3
    return response(200, {
      status:  'ok',
      verdict: verdict.description,
    });

  } catch (err) {
    console.error('Scan error:', err);
    return response(500, { error: err.message });

  } finally {
    if (tmpPath && existsSync(tmpPath)) {
      await unlink(tmpPath).catch(() => {});
    }
  }
};

function response(statusCode, body) {
  return {
    statusCode,
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify(body),
  };
}

S3 trigger pattern

A common serverless architecture triggers Lambda when a file lands in a "quarantine" S3 bucket. Lambda downloads the file to /tmp, scans it, then either moves it to a "clean" bucket or deletes it.

// index.js — S3 trigger variant
const { scan, Verdict } = require('pompelmi');
const { S3Client, GetObjectCommand,
        CopyObjectCommand, DeleteObjectCommand } = require('@aws-sdk/client-s3');
const { writeFile, unlink } = require('fs/promises');
const { existsSync, createWriteStream } = require('fs');
const { join } = require('path');
const { pipeline } = require('stream/promises');
const { randomBytes } = require('crypto');

const s3 = new S3Client({});

const CLEAN_BUCKET = process.env.CLEAN_BUCKET;

exports.handler = async (event) => {
  for (const record of event.Records) {
    const bucket = record.s3.bucket.name;
    const key    = decodeURIComponent(record.s3.object.key.replace(/\+/g, ' '));

    const tmpPath = join('/tmp', randomBytes(16).toString('hex'));

    try {
      // Download from S3 quarantine bucket to /tmp
      const { Body } = await s3.send(new GetObjectCommand({ Bucket: bucket, Key: key }));
      await pipeline(Body, createWriteStream(tmpPath));

      const verdict = await scan(tmpPath);

      if (verdict === Verdict.Clean) {
        // Move to clean bucket
        await s3.send(new CopyObjectCommand({
          Bucket:     CLEAN_BUCKET,
          Key:        key,
          CopySource: `${bucket}/${key}`,
          Tagging:    'scan-status=clean&scanned-by=pompelmi',
          TaggingDirective: 'REPLACE',
        }));
      }

      // Delete from quarantine bucket regardless of verdict
      await s3.send(new DeleteObjectCommand({ Bucket: bucket, Key: key }));

      console.log(`Key=${key} Verdict=${verdict.description}`);

    } finally {
      if (existsSync(tmpPath)) await unlink(tmpPath).catch(() => {});
    }
  }
};

With the S3 trigger pattern, files briefly exist in the quarantine bucket before scanning completes. Ensure the quarantine bucket has no public access and that no downstream systems read from it before the Lambda moves files to the clean bucket.

Cold starts and database freshness

Baking the virus database into the container image keeps cold starts fast but means the database ages as time passes. Consider these strategies:

Strategy	Cold start	DB freshness
Database baked into image (this guide)	Fast — no network on startup	As fresh as the last image rebuild
Run freshclam on cold start from `/tmp`	Slow — 1–3 min for DB download	Always current
Cache DB in S3, copy to `/tmp` on cold start	Medium — S3 copy (~10 s)	As fresh as your S3 update job

For most use cases, rebuilding the container image weekly with a scheduled CI job provides a good balance between cold-start speed and database currency.

Next steps

Testing locally before deploying to Lambda? See Running pompelmi with ClamAV in Docker Compose — the Dockerfile above runs identically with docker run.
Need a standalone REST API instead of a trigger? Read Building a file scanning REST API with Node.js and pompelmi.
Scan-then-upload to S3? See Scanning files before uploading to AWS S3.