Scanning files before uploading to AWS S3 in Node.js

Many Node.js applications accept file uploads from users and store them in AWS S3. A common question is: where do you scan for malware — before or after uploading to S3?

The answer is before. Scan on your server while the file is still in a temporary location on disk. Only upload to S3 if the scan returns Verdict.Clean. This guide shows you exactly how to set that up.

If you're new to pompelmi, read Getting started with antivirus scanning in Node.js first, then come back here for the S3-specific pattern.

Why scan locally, not in S3?

You might wonder about scanning files after they land in S3, using a Lambda trigger or a third-party service. That approach has real problems:

  • The file is accessible before it's scanned. If another Lambda, worker, or user downloads the file between upload and scan, they get an unscanned file. You need a quarantine bucket and complex access controls to avoid this window.
  • Malicious files touch S3 at all. Even briefly storing malware in S3 may violate your organization's security policy or compliance requirements (PCI DSS, HIPAA, SOC 2).
  • More moving parts. Lambda triggers, EventBridge rules, IAM roles — this is significant infrastructure to maintain. Scanning on the server before upload is simpler.

Scanning on the server with pompelmi means malware never reaches S3 at all.

Install

npm install pompelmi @aws-sdk/client-s3 multer express

The scan-then-upload flow

  1. Multer receives the multipart upload and writes it to a local temp file.
  2. scan(tmpPath) calls ClamAV on the local file.
  3. If Verdict.Malicious or Verdict.ScanError: delete the temp file, reject the request. S3 is never touched.
  4. If Verdict.Clean: read the temp file and upload it to S3 using the AWS SDK.
  5. Delete the temp file after the S3 upload completes.

Complete Express example

const express = require('express');
const multer  = require('multer');
const { scan, Verdict } = require('pompelmi');
const { S3Client, PutObjectCommand } = require('@aws-sdk/client-s3');
const crypto = require('crypto');
const path   = require('path');
const fs     = require('fs');
const os     = require('os');

const app    = express();
const upload = multer({
  dest:   os.tmpdir(),
  limits: { fileSize: 50 * 1024 * 1024 }   // 50 MB limit
});

const s3 = new S3Client({ region: process.env.AWS_REGION || 'us-east-1' });

const BUCKET = process.env.S3_BUCKET;  // Set via environment variable

app.post('/upload', upload.single('file'), async (req, res) => {
  if (!req.file) {
    return res.status(400).json({ error: 'No file provided.' });
  }

  const tmpPath = req.file.path;
  let tmpDeleted = false;

  try {
    // Step 1 — Scan locally before touching S3
    const verdict = await scan(tmpPath);

    if (verdict === Verdict.Malicious) {
      return res.status(400).json({ error: 'Malware detected. Upload rejected.' });
    }

    if (verdict === Verdict.ScanError) {
      return res.status(422).json({ error: 'Scan incomplete. Upload rejected as precaution.' });
    }

    // Step 2 — File is clean, upload to S3
    const ext       = path.extname(req.file.originalname).toLowerCase();
    const s3Key     = 'uploads/' + crypto.randomBytes(16).toString('hex') + ext;
    const fileStream = fs.createReadStream(tmpPath);

    await s3.send(new PutObjectCommand({
      Bucket:      BUCKET,
      Key:         s3Key,
      Body:        fileStream,
      ContentType: req.file.mimetype,
      // Tag the object to confirm it has been scanned
      Tagging:     'scan-status=clean&scanned-by=pompelmi'
    }));

    // Step 3 — Clean up temp file
    fs.unlinkSync(tmpPath);
    tmpDeleted = true;

    res.json({
      status: 'ok',
      key:    s3Key,
      url:    `https://${BUCKET}.s3.amazonaws.com/${s3Key}`
    });

  } catch (err) {
    res.status(500).json({ error: err.message });
  } finally {
    if (!tmpDeleted && fs.existsSync(tmpPath)) {
      fs.unlinkSync(tmpPath);
    }
  }
});

app.listen(3000, () => console.log('Listening on :3000'));
Never hardcode AWS credentials. Use environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) or, better, attach an IAM role to your EC2 instance or ECS task so credentials are provided automatically.

S3 object tagging for audit trails

Adding S3 object tags when uploading gives you an audit trail of which files were scanned, when, and by what tool. You can then write S3 bucket policies that deny access to untagged objects — preventing any file that bypassed the scan from being accessed.

// Tagging added to PutObjectCommand
Tagging: 'scan-status=clean&scanned-by=pompelmi&scan-engine=clamav'

An S3 bucket policy to deny access to unscanned files:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Deny",
      "Principal": "*",
      "Action": "s3:GetObject",
      "Resource": "arn:aws:s3:::your-bucket/uploads/*",
      "Condition": {
        "StringNotEquals": {
          "s3:ExistingObjectTag/scan-status": "clean"
        }
      }
    }
  ]
}
S3 tag-based bucket policies provide defence-in-depth: even if a file somehow bypassed scanning and reached S3, the bucket policy would prevent it from being served.

Optional: quarantine bucket for rejected files

In some regulated environments you need to preserve malicious files for forensic analysis rather than deleting them outright. Instead of calling fs.unlinkSync() on rejection, upload the file to a separate, private quarantine bucket.

async function quarantineFile(tmpPath, originalName, verdict) {
  const key = 'quarantine/' + Date.now() + '-' + originalName;

  await s3.send(new PutObjectCommand({
    Bucket:      process.env.QUARANTINE_BUCKET,
    Key:         key,
    Body:        fs.createReadStream(tmpPath),
    // Mark the object as dangerous
    Tagging:     'scan-status=' + verdict.description + '&scanned-by=pompelmi',
    // Block all public access at the object level (bucket policy should also block)
    ACL:         'private'
  }));

  fs.unlinkSync(tmpPath);   // Remove from local disk after uploading to quarantine
  return key;
}

// In the route handler:
if (verdict === Verdict.Malicious) {
  const qKey = await quarantineFile(tmpPath, req.file.originalname, verdict);
  tmpDeleted = true;
  console.warn('Malicious file quarantined at:', qKey);
  return res.status(400).json({ error: 'Malware detected.' });
}

For containerized deployments, see Running pompelmi with ClamAV in Docker Compose for how to run ClamAV as a sidecar next to your Node.js service.