Scanning Excel and CSV files for malicious macros

Spreadsheet files are a common malware delivery vector. If your application accepts uploads of .xls, .xlsx, .xlsm, .xlsb, or .csv files from untrusted users, you face several distinct threats:

  • VBA macros. Legacy .xls files and macro-enabled .xlsm / .xlsb files can contain Visual Basic for Applications code that executes on the victim's machine when the file is opened. Many ransomware and trojan campaigns are delivered via malicious Excel macros.
  • DDE (Dynamic Data Exchange) injection in CSV. A CSV cell that starts with =, +, -, or @ can be interpreted as a formula by spreadsheet applications. Cells like =HYPERLINK("http://evil.example/payload","Click here") or =cmd|' /c calc'!A0 execute system commands when opened.
  • Embedded objects and OLE streams. Office documents can embed other objects — executables, scripts, or other Office files — inside OLE compound document streams.

What ClamAV detects

ClamAV has extensive coverage of Office malware. It scans OLE2 compound documents (the binary .xls format) and OOXML archives (the ZIP-based .xlsx / .xlsm format), looking for:

  • Known VBA macro malware signatures (thousands of entries)
  • Heuristic patterns in macro code (obfuscated Shell, WScript, PowerShell calls)
  • Malicious embedded objects and shellcode
  • Suspicious URL patterns in embedded hyperlinks

When ClamAV finds a macro threat it returns exit code 1, which pompelmi maps to Verdict.Malicious. For unknown or novel macro payloads it may return Verdict.Clean — ClamAV is signature-based and cannot detect every possible obfuscated macro.

Use pompelmi as one layer in a broader defence, not as the sole control. See the defence-in-depth section below for complementary measures.

Detecting macro-enabled formats by magic bytes

The file extension alone cannot be trusted — users can rename files. Check the actual format using magic bytes before deciding whether to accept the file:

Format Extension Magic bytes (hex) Notes
Binary Excel (OLE2) .xls D0 CF 11 E0 Can contain VBA macros
OpenXML (ZIP-based) .xlsx, .xlsm, .xlsb 50 4B 03 04 .xlsm / .xlsb contain macros
CSV (plain text) .csv no specific magic — check content-type and extension Subject to DDE injection
function readMagic(filePath, length = 4) {
  const fd  = require('fs').openSync(filePath, 'r');
  const buf = Buffer.alloc(length);
  require('fs').readSync(fd, buf, 0, length, 0);
  require('fs').closeSync(fd);
  return buf;
}

function isMacroEnabled(filePath, originalName) {
  const magic = readMagic(filePath);
  const ext   = originalName.split('.').pop()?.toLowerCase();

  // .xls binary format — always supports macros
  if (magic[0] === 0xD0 && magic[1] === 0xCF) return true;

  // ZIP-based OOXML — only macro-enabled variants are dangerous
  const isZip = magic[0] === 0x50 && magic[1] === 0x4B;
  return isZip && (ext === 'xlsm' || ext === 'xlsb');
}

Sanitising DDE injection in CSV files

CSV files are plain text, so ClamAV cannot detect DDE injection. You need to sanitise CSV content server-side before storing or processing it.

The standard mitigation is to prefix any cell value that starts with =, +, -, or @ with a single quote or tab, preventing spreadsheet applications from interpreting it as a formula:

const fs = require('fs');

// Sanitise a CSV file in-place: prefix formula triggers with a tab
function sanitiseCsvDde(filePath) {
  const content    = fs.readFileSync(filePath, 'utf8');
  const sanitised  = content
    .split('\n')
    .map((line) => {
      return line
        .split(',')
        .map((cell) => {
          const trimmed = cell.trim().replace(/^["']|["']$/g, '');
          if (/^[=+\-@]/.test(trimmed)) {
            // Prefix with tab to neutralise formula interpretation
            return `"\t${trimmed}"`;
          }
          return cell;
        })
        .join(',');
    })
    .join('\n');

  fs.writeFileSync(filePath, sanitised, 'utf8');
}
DDE sanitisation changes the file content. Apply it only if you intend to store the sanitised version. If you need to preserve the original (e.g. for re-download), sanitise a copy and serve the sanitised version to downstream consumers.

Complete upload handler

const express = require('express');
const multer  = require('multer');
const { scan, Verdict } = require('pompelmi');
const fs = require('fs');
const os = require('os');

const app    = express();
const upload = multer({
  dest:   os.tmpdir(),
  limits: { fileSize: 20 * 1024 * 1024 }, // 20 MB
});

const ALLOWED_EXTENSIONS = new Set(['xlsx', 'csv']);

function readMagic(filePath) {
  const fd  = fs.openSync(filePath, 'r');
  const buf = Buffer.alloc(4);
  fs.readSync(fd, buf, 0, 4, 0);
  fs.closeSync(fd);
  return buf;
}

function isMacroEnabled(filePath, originalName) {
  const magic = readMagic(filePath);
  const ext   = originalName.split('.').pop()?.toLowerCase();
  if (magic[0] === 0xD0 && magic[1] === 0xCF) return true; // .xls
  const isZip = magic[0] === 0x50 && magic[1] === 0x4B;
  return isZip && (ext === 'xlsm' || ext === 'xlsb');
}

app.post('/upload/spreadsheet', upload.single('file'), async (req, res) => {
  if (!req.file) return res.status(400).json({ error: 'No file provided.' });

  const tmpPath = req.file.path;

  try {
    const ext = req.file.originalname.split('.').pop()?.toLowerCase();

    // Step 1 — extension allowlist
    if (!ALLOWED_EXTENSIONS.has(ext)) {
      return res.status(400).json({ error: `File type .${ext} is not accepted.` });
    }

    // Step 2 — reject macro-enabled formats
    if (isMacroEnabled(tmpPath, req.file.originalname)) {
      return res.status(400).json({
        error: 'Macro-enabled spreadsheets (.xls, .xlsm, .xlsb) are not accepted. Please save as .xlsx.',
      });
    }

    // Step 3 — ClamAV scan
    const verdict = await scan(tmpPath);

    if (verdict === Verdict.Malicious) {
      return res.status(400).json({ error: 'Malware detected. Upload rejected.' });
    }
    if (verdict === Verdict.ScanError) {
      return res.status(422).json({ error: 'Scan incomplete. Upload rejected.' });
    }

    // Step 4 — DDE sanitisation for CSV files
    if (ext === 'csv') {
      sanitiseCsvDde(tmpPath);
    }

    // Step 5 — move to permanent storage
    return res.json({ status: 'ok', file: req.file.originalname });

  } finally {
    if (fs.existsSync(tmpPath)) fs.unlinkSync(tmpPath);
  }
});

function sanitiseCsvDde(filePath) {
  const content   = fs.readFileSync(filePath, 'utf8');
  const sanitised = content.split('\n').map((line) =>
    line.split(',').map((cell) => {
      const trimmed = cell.trim().replace(/^["']|["']$/g, '');
      return /^[=+\-@]/.test(trimmed) ? `"\t${trimmed}"` : cell;
    }).join(',')
  ).join('\n');
  fs.writeFileSync(filePath, sanitised, 'utf8');
}

Defence in depth

Layer What it stops
File size limit (Multer) Oversized uploads, resource exhaustion
Extension allowlist Unexpected file types
Magic byte check + macro format rejection Misnamed VBA-capable files, legacy binary formats
pompelmi + ClamAV scan Known macro malware, embedded executables, trojans
CSV DDE sanitisation Formula injection in plain-text CSV files
Content-Security-Policy on download responses Client-side formula execution when served back

Next steps