Scanning Node.js Readable Streams with pompelmi

Most file scanning happens after a file is already on disk. But many Node.js pipelines never write to disk at all: S3 getObject returns a Readable stream, HTTP responses are streams, and custom transform pipelines can pipe data from one service to another without a single writeFileSync.

Scanning these in-flight bytes with pompelmi v1.3 required either buffering the whole stream into a Buffer (memory overhead) or writing a temp file yourself and cleaning it up (error-prone boilerplate). pompelmi v1.4.0 ships scanStream() to handle both cases cleanly.

The solution: scanStream(stream, [options])

scanStream() accepts any Node.js Readable and returns the same three typed verdict Symbols as scan() and scanBuffer():

const { scanStream, Verdict } = require('pompelmi');
const { Readable } = require('stream');

// Useful for S3 getObject, HTTP downloads, or any piped source
const stream = s3.getObject({ Bucket, Key }).createReadStream();
const result = await scanStream(stream);

if (result === Verdict.Malicious) throw new Error('Malware detected.');
if (result === Verdict.ScanError) console.warn('Scan incomplete.');

One validation error surfaces immediately as a rejected Promise:

  • stream must be a Readable — when the argument is not a Node.js Readable instance

If the stream itself emits an 'error' event during scanning, that error is propagated as-is.

TCP mode vs local mode

scanStream() behaves differently depending on whether you provide a host or port option.

TCP mode — no disk I/O

When host or port is set, the stream is piped directly to a running clamd daemon using the ClamAV INSTREAM protocol. Each 'data' chunk is sent to clamd prefixed with a 4-byte big-endian length header, terminated with four zero bytes. No data is written to disk at any point — the bytes travel from your Readable straight to clamd over TCP.

// clamd sidecar in Docker Compose or Kubernetes
const result = await scanStream(stream, {
  host:    '127.0.0.1',
  port:    3310,
  timeout: 30_000,
});

This is ideal for serverless functions, read-only containers, and pipelines that process S3 objects or HTTP downloads without touching the filesystem.

Local mode — temp file, auto-cleaned

Without host or port, pompelmi pipes the stream to a randomly-named temp file under os.tmpdir(), calls clamscan on it, and deletes the file in a finally block — cleanup happens whether the scan succeeds, returns an error verdict, or throws.

// Local clamscan — temp file created and deleted automatically
const result = await scanStream(stream);

Scanning an S3 object stream

The AWS SDK v3 returns a Readable from GetObjectCommand. Pass it directly to scanStream() — no buffering, no temp file.

const { S3Client, GetObjectCommand } = require('@aws-sdk/client-s3');
const { scanStream, Verdict } = require('pompelmi');

const s3 = new S3Client({ region: 'us-east-1' });

async function scanS3Object(bucket, key) {
  const { Body } = await s3.send(new GetObjectCommand({ Bucket: bucket, Key: key }));

  const result = await scanStream(Body, {
    host: '127.0.0.1',
    port: 3310,
  });

  if (result === Verdict.Malicious) {
    throw new Error(`Malware detected in s3://${bucket}/${key}`);
  }

  return result; // Verdict.Clean or Verdict.ScanError
}

Scanning an HTTP download stream

Node's https.get provides a Readable directly in the callback. Scan it before writing to disk or forwarding downstream.

const https = require('https');
const { scanStream, Verdict } = require('pompelmi');

function scanHttpUrl(url) {
  return new Promise((resolve, reject) => {
    https.get(url, async (response) => {
      try {
        const result = await scanStream(response, {
          host: '127.0.0.1',
          port: 3310,
        });
        resolve(result);
      } catch (err) {
        reject(err);
      }
    }).on('error', reject);
  });
}

const result = await scanHttpUrl('https://example.com/upload.pdf');
if (result === Verdict.Malicious) throw new Error('Malware detected in download.');

Full error handling

const { scanStream, Verdict } = require('pompelmi');

async function safeScanStream(stream) {
  try {
    const result = await scanStream(stream, {
      host: process.env.CLAMD_HOST,
      port: Number(process.env.CLAMD_PORT) || 3310,
    });

    if (result === Verdict.ScanError) {
      console.warn('Scan incomplete — rejecting as precaution.');
      return null;
    }

    return result; // Verdict.Clean or Verdict.Malicious
  } catch (err) {
    // Not a Readable, stream errored, clamd unreachable, etc.
    console.error('Scan threw:', err.message);
    return null;
  }
}