Background virus scanning with BullMQ and Node.js
Synchronous scanning works well for small files on fast hardware, but for larger files or high upload volumes the scan can add seconds to every HTTP request. Background scanning decouples the upload from the scan:
- The upload endpoint writes the file to disk and returns HTTP 202 immediately.
- A BullMQ job is enqueued with the file's path and upload metadata.
- A worker processes the job: scans with pompelmi, then moves the file to permanent storage or deletes it.
- The client polls a status endpoint — or receives a webhook — to learn the final verdict.
BullMQ is the standard Redis-backed job queue for Node.js. It provides retries, priorities, rate limiting, and a dashboard (Bull Board).
docker run -d -p 6379:6379 redis:alpine is the quickest
way to get it running.
Install
npm install pompelmi bullmq ioredis express multer
Upload endpoint
The endpoint accepts the file, writes it to a staging directory (not
/tmp — it must survive until the worker processes the job),
enqueues a scan job, and returns the job ID to the client:
// src/routes/upload.js
const express = require('express');
const multer = require('multer');
const { Queue } = require('bullmq');
const crypto = require('crypto');
const path = require('path');
const router = express.Router();
// Use a persistent staging directory, not /tmp
const STAGING_DIR = process.env.STAGING_DIR || '/var/app/staging';
const upload = multer({
storage: multer.diskStorage({
destination: STAGING_DIR,
filename: (_req, file, cb) => {
const id = crypto.randomBytes(16).toString('hex');
const ext = path.extname(file.originalname).toLowerCase();
cb(null, id + ext);
},
}),
limits: { fileSize: 100 * 1024 * 1024 }, // 100 MB
});
const scanQueue = new Queue('scan', {
connection: { host: process.env.REDIS_HOST || '127.0.0.1', port: 6379 },
});
router.post('/', upload.single('file'), async (req, res) => {
if (!req.file) {
return res.status(400).json({ error: 'No file provided.' });
}
const job = await scanQueue.add('scanFile', {
filePath: req.file.path,
originalName: req.file.originalname,
mimeType: req.file.mimetype,
uploadedAt: new Date().toISOString(),
});
// Return 202 Accepted — scan has not happened yet
return res.status(202).json({
jobId: job.id,
statusUrl: `/scan-status/${job.id}`,
});
});
module.exports = router;
Scan worker
Run the worker as a separate process (or separate container). It dequeues jobs, calls pompelmi, and moves the file to permanent storage or deletes it:
// src/workers/scan.worker.js
const { Worker } = require('bullmq');
const { scan, Verdict } = require('pompelmi');
const fs = require('fs');
const path = require('path');
const CLEAN_DIR = process.env.CLEAN_DIR || '/var/app/uploads';
const worker = new Worker('scan', async (job) => {
const { filePath, originalName, mimeType } = job.data;
try {
const verdict = await scan(filePath);
if (verdict === Verdict.Malicious) {
fs.unlinkSync(filePath);
return { status: 'rejected', reason: 'malware' };
}
if (verdict === Verdict.ScanError) {
fs.unlinkSync(filePath);
return { status: 'rejected', reason: 'scan_error' };
}
// Clean — move to permanent storage
const dest = path.join(CLEAN_DIR, path.basename(filePath));
fs.renameSync(filePath, dest);
return {
status: 'clean',
storedPath: dest,
originalName,
};
} catch (err) {
// Worker throws → BullMQ marks the job as failed and retries
if (fs.existsSync(filePath)) fs.unlinkSync(filePath);
throw err;
}
}, {
connection: { host: process.env.REDIS_HOST || '127.0.0.1', port: 6379 },
concurrency: 4, // process up to 4 scans in parallel
});
worker.on('completed', (job, result) => {
console.log(`Job ${job.id} completed: ${result.status}`);
});
worker.on('failed', (job, err) => {
console.error(`Job ${job?.id} failed: ${err.message}`);
});
concurrency based on your hardware. ClamAV is CPU-bound —
a good starting point is one worker per CPU core. Monitor system load and
tune accordingly.
Status polling endpoint
The client uses the jobId returned at upload time to poll for
the scan result:
// src/routes/scanStatus.js
const express = require('express');
const { Queue } = require('bullmq');
const router = express.Router();
const scanQueue = new Queue('scan', {
connection: { host: process.env.REDIS_HOST || '127.0.0.1', port: 6379 },
});
router.get('/:jobId', async (req, res) => {
const job = await scanQueue.getJob(req.params.jobId);
if (!job) {
return res.status(404).json({ error: 'Job not found.' });
}
const state = await job.getState(); // 'waiting' | 'active' | 'completed' | 'failed'
if (state === 'completed') {
return res.json({ state, result: job.returnvalue });
}
if (state === 'failed') {
return res.json({ state, reason: job.failedReason });
}
return res.json({ state });
});
module.exports = router;
Client-side polling loop (plain JavaScript):
async function pollScan(jobId, intervalMs = 2000) {
while (true) {
const res = await fetch(`/scan-status/${jobId}`);
const data = await res.json();
if (data.state === 'completed') return data.result;
if (data.state === 'failed') throw new Error(data.reason);
await new Promise((r) => setTimeout(r, intervalMs));
}
}
Webhook notification variant
Instead of (or in addition to) polling, emit a webhook when the job
completes. Listen to the worker's completed event:
const axios = require('axios');
worker.on('completed', async (job, result) => {
const webhookUrl = job.data.webhookUrl;
if (!webhookUrl) return;
await axios.post(webhookUrl, {
jobId: job.id,
result,
}).catch((err) => {
console.error('Webhook delivery failed:', err.message);
});
});
Pass webhookUrl as a field in the upload request body, and
include it when enqueuing the job. This is especially useful for mobile apps
or third-party integrations that cannot maintain a polling loop.
Next steps
- After scanning, upload to object storage? See Scanning files before uploading to AWS S3 — adapt the "move to permanent storage" step to call S3 instead.
-
Running the worker in Kubernetes?
See Setting up pompelmi with ClamAV on Kubernetes
for the TCP-mode setup — use
scan(path, { host: 'clamav', port: 3310 })in the worker. - Building a standalone scan API? Read Building a file scanning REST API with Node.js and pompelmi.