Recursively Scanning a Directory for Malware with pompelmi

Most antivirus integrations in Node.js are request-scoped: a file arrives, you scan it, you accept or reject it. But several real-world scenarios need something different — you want to scan an entire directory tree, not a single file:

  • Batch processing — a worker queue processes uploaded archives that were already unpacked to disk. You need to scan all extracted files before moving them to permanent storage.
  • Scheduled background scans — you run a nightly cron job that rescans an /uploads directory to catch threats that slipped through signature updates.
  • Quarantine workflows — after scanning, you want to move infected files to a separate directory rather than deleting them, so they can be reviewed later.

pompelmi v1.5.0 ships scanDirectory() to cover all of these patterns with a single call.

The API

scanDirectory(dirPath, [options]) recursively walks dirPath using Node's built-in fs.readdirSync({ recursive: true }) (Node 18+), scans every file concurrently with Promise.all, and returns an object with three arrays:

  • clean — absolute paths of files with no threats found
  • malicious — absolute paths of files with a matched signature
  • errors — absolute paths of files that could not be scanned

Per-file scan failures are collected into errors rather than aborting the whole scan. The function only throws for top-level validation errors (wrong argument type, directory not found).

scanDirectory(
  dirPath: string,
  options?: { host?: string; port?: number; timeout?: number }
): Promise<{ clean: string[], malicious: string[], errors: string[] }>

Basic usage

const { scanDirectory } = require('pompelmi');

const results = await scanDirectory('/uploads');

console.log('Clean:', results.clean);
console.log('Malicious:', results.malicious);
console.log('Errors:', results.errors);

The same host / port / timeout options accepted by scan() are forwarded to every per-file scan. To use a remote clamd sidecar instead of the local clamscan binary:

const results = await scanDirectory('/uploads', {
  host:    '127.0.0.1',
  port:    3310,
  timeout: 30_000,
});

Quarantine workflow

Instead of immediately deleting malicious files, move them to a quarantine directory so they can be reviewed or submitted to a threat-intelligence feed before final disposal.

const fs   = require('fs');
const path = require('path');
const { scanDirectory } = require('pompelmi');

const UPLOADS_DIR    = '/var/app/uploads';
const QUARANTINE_DIR = '/var/app/quarantine';

fs.mkdirSync(QUARANTINE_DIR, { recursive: true });

async function scanAndQuarantine() {
  const { malicious, errors } = await scanDirectory(UPLOADS_DIR);

  for (const filePath of malicious) {
    const dest = path.join(QUARANTINE_DIR, path.basename(filePath));
    fs.renameSync(filePath, dest);
    console.log(`Quarantined: ${filePath} → ${dest}`);
  }

  if (errors.length > 0) {
    console.warn('Could not scan:', errors);
  }
}

scanAndQuarantine().catch(console.error);
fs.renameSync only works within the same filesystem. If QUARANTINE_DIR is on a different mount point, use fs.copyFileSync followed by fs.unlinkSync instead.

Scheduled background scans

Combine scanDirectory() with a scheduler like node-cron to run a nightly rescan of your upload directory. This catches threats that slipped through when signatures were out of date.

const cron = require('node-cron');
const { scanDirectory } = require('pompelmi');

// Run every night at 02:00
cron.schedule('0 2 * * *', async () => {
  console.log('Starting nightly scan…');
  const { clean, malicious, errors } = await scanDirectory('/var/app/uploads');

  console.log(`Scan complete: ${clean.length} clean, ${malicious.length} malicious, ${errors.length} errors`);

  if (malicious.length > 0) {
    // alert your team, move to quarantine, emit a metric, etc.
    console.error('Malicious files detected:', malicious);
  }
});

Error handling

scanDirectory() throws synchronously for two top-level validation errors and collects everything else into errors:

const { scanDirectory } = require('pompelmi');

// Top-level errors — these throw (reject the Promise)
await scanDirectory(42);            // Error: dirPath must be a string
await scanDirectory('/nonexistent'); // Error: Directory not found: /nonexistent

// Per-file errors — collected into results.errors, never thrown
const { errors } = await scanDirectory('/uploads');
// errors contains paths of files clamscan could not open, encrypted archives, etc.

A non-empty errors array does not mean the scan failed — it means some files produced Verdict.ScanError (ClamAV exit code 2) or threw while being scanned. Treat those files as untrusted and inspect or quarantine them separately.