Scanning files before uploading to Azure Blob Storage

Azure Blob Storage is Microsoft's object storage service and the standard choice for file storage on Azure. This guide shows how to integrate pompelmi into a Node.js upload endpoint that stores files in Azure Blob Storage — scanning each file locally before it is uploaded, so malware never reaches the cloud.

New to pompelmi? Read Getting started with antivirus scanning in Node.js first, then return here for the Azure-specific pattern.

Install

npm install pompelmi @azure/storage-blob multer express

Authentication

The @azure/storage-blob SDK supports several authentication strategies. The two most common for server-side Node.js are:

  • Connection string — simplest for local development. Copy the connection string from the Azure portal under Storage account → Access keys and set it as AZURE_STORAGE_CONNECTION_STRING.
  • Managed Identity — recommended for production. No secrets to rotate. See the Managed Identity section below.
const { BlobServiceClient } = require('@azure/storage-blob');

// Connection string (dev / simple deployments)
const blobServiceClient = BlobServiceClient.fromConnectionString(
  process.env.AZURE_STORAGE_CONNECTION_STRING
);

Complete Express example

const express = require('express');
const multer  = require('multer');
const { scan, Verdict } = require('pompelmi');
const { BlobServiceClient } = require('@azure/storage-blob');
const crypto = require('crypto');
const path   = require('path');
const fs     = require('fs');
const os     = require('os');

const app    = express();
const upload = multer({
  dest:   os.tmpdir(),
  limits: { fileSize: 50 * 1024 * 1024 },  // 50 MB
});

const blobServiceClient = BlobServiceClient.fromConnectionString(
  process.env.AZURE_STORAGE_CONNECTION_STRING
);
const containerClient = blobServiceClient.getContainerClient(
  process.env.AZURE_CONTAINER_NAME
);

app.post('/upload', upload.single('file'), async (req, res) => {
  if (!req.file) {
    return res.status(400).json({ error: 'No file provided.' });
  }

  const tmpPath  = req.file.path;
  let tmpDeleted = false;

  try {
    // Step 1 — scan locally before touching Azure
    const verdict = await scan(tmpPath);

    if (verdict === Verdict.Malicious) {
      return res.status(400).json({ error: 'Malware detected. Upload rejected.' });
    }
    if (verdict === Verdict.ScanError) {
      return res.status(422).json({ error: 'Scan incomplete. Upload rejected.' });
    }

    // Step 2 — file is clean, upload to Azure Blob Storage
    const ext       = path.extname(req.file.originalname).toLowerCase();
    const blobName  = 'uploads/' + crypto.randomBytes(16).toString('hex') + ext;
    const blockBlob = containerClient.getBlockBlobClient(blobName);

    await blockBlob.uploadFile(tmpPath, {
      blobHTTPHeaders: { blobContentType: req.file.mimetype },
      metadata: {
        scanStatus:  'clean',
        scannedBy:   'pompelmi',
        scanEngine:  'clamav',
        scannedAt:   new Date().toISOString(),
        originalName: encodeURIComponent(req.file.originalname),
      },
    });

    // Step 3 — clean up
    fs.unlinkSync(tmpPath);
    tmpDeleted = true;

    return res.json({
      status: 'ok',
      blobName,
      url: blockBlob.url,
    });

  } catch (err) {
    return res.status(500).json({ error: err.message });

  } finally {
    if (!tmpDeleted && fs.existsSync(tmpPath)) {
      fs.unlinkSync(tmpPath);
    }
  }
});

app.listen(3000, () => console.log('Listening on :3000'));
Azure Blob Storage metadata keys must be valid C# identifiers — they cannot contain hyphens. Use camelCase keys (scanStatus, not scan-status).

Blob metadata and tags

Azure offers two ways to attach key-value data to a blob:

  • Metadata — attached at upload time or via setMetadata(). Returned in the HTTP response headers. Keys are case-insensitive and must be valid identifiers.
  • Blob index tags — queryable across an entire container. Useful for finding all blobs with scanStatus = 'clean'.
// Set blob index tags for queryable audit trail
await blockBlob.setTags({
  scanStatus: 'clean',
  scannedBy:  'pompelmi',
});

Query blobs by tag using the Blob Service's tag filtering:

// Find all unscanned blobs in the account
const tagFilter = `"scanStatus" = 'clean'`;
for await (const blob of blobServiceClient.findBlobsByTags(tagFilter)) {
  console.log(blob.name, blob.tags);
}

Using Managed Identity in production

Avoid storing connection strings in environment variables for production. Use Managed Identity with DefaultAzureCredential:

npm install @azure/identity
const { BlobServiceClient } = require('@azure/storage-blob');
const { DefaultAzureCredential } = require('@azure/identity');

const blobServiceClient = new BlobServiceClient(
  `https://${process.env.AZURE_STORAGE_ACCOUNT_NAME}.blob.core.windows.net`,
  new DefaultAzureCredential()
);

Assign the Storage Blob Data Contributor role to your app's managed identity in the Azure portal under the storage account's Access control (IAM) blade. No credentials are stored anywhere.

Next steps