Scanning files before uploading to Google Cloud Storage

Google Cloud Storage (GCS) is the object storage service for Google Cloud Platform. Like AWS S3, the recommended approach for virus scanning is to scan the file locally while it is still on your server's disk, and only call the GCS upload API if the scan returns Verdict.Clean. This prevents malware from ever reaching GCS.

New to pompelmi? Read Getting started with antivirus scanning in Node.js first, then return here for the GCS-specific pattern.

Install

npm install pompelmi @google-cloud/storage multer express

Authentication

The @google-cloud/storage SDK authenticates using Application Default Credentials (ADC). In practice this means:

  • Local development: run gcloud auth application-default login and credentials are picked up automatically.
  • GCE / Cloud Run / GKE: attach a service account to the instance or workload. The SDK finds credentials automatically — no key file needed.
  • Outside GCP: set the environment variable GOOGLE_APPLICATION_CREDENTIALS to the path of a service account JSON key file.

The service account needs the Storage Object Creator role (roles/storage.objectCreator) on your bucket.

Complete Express example

const express = require('express');
const multer  = require('multer');
const { scan, Verdict } = require('pompelmi');
const { Storage } = require('@google-cloud/storage');
const crypto = require('crypto');
const path   = require('path');
const fs     = require('fs');
const os     = require('os');

const app    = express();
const upload = multer({
  dest:   os.tmpdir(),
  limits: { fileSize: 50 * 1024 * 1024 },  // 50 MB
});

// Uses Application Default Credentials automatically
const storage = new Storage();
const bucket  = storage.bucket(process.env.GCS_BUCKET_NAME);

app.post('/upload', upload.single('file'), async (req, res) => {
  if (!req.file) {
    return res.status(400).json({ error: 'No file provided.' });
  }

  const tmpPath   = req.file.path;
  let tmpDeleted  = false;

  try {
    // Step 1 — scan locally before touching GCS
    const verdict = await scan(tmpPath);

    if (verdict === Verdict.Malicious) {
      return res.status(400).json({ error: 'Malware detected. Upload rejected.' });
    }
    if (verdict === Verdict.ScanError) {
      return res.status(422).json({ error: 'Scan incomplete. Upload rejected.' });
    }

    // Step 2 — file is clean, upload to GCS
    const ext        = path.extname(req.file.originalname).toLowerCase();
    const gcsKey     = 'uploads/' + crypto.randomBytes(16).toString('hex') + ext;
    const gcsFile    = bucket.file(gcsKey);

    await gcsFile.save(fs.readFileSync(tmpPath), {
      contentType: req.file.mimetype,
      metadata: {
        // Custom metadata visible in GCS console and API responses
        'scan-status': 'clean',
        'scanned-by':  'pompelmi',
        'scan-engine': 'clamav',
        'scanned-at':  new Date().toISOString(),
      },
    });

    // Step 3 — clean up temp file
    fs.unlinkSync(tmpPath);
    tmpDeleted = true;

    return res.json({
      status: 'ok',
      key:    gcsKey,
      url:    `https://storage.googleapis.com/${process.env.GCS_BUCKET_NAME}/${gcsKey}`,
    });

  } catch (err) {
    return res.status(500).json({ error: err.message });

  } finally {
    if (!tmpDeleted && fs.existsSync(tmpPath)) {
      fs.unlinkSync(tmpPath);
    }
  }
});

app.listen(3000, () => console.log('Listening on :3000'));
For large files, prefer the streaming upload API to avoid loading the entire file into memory. Replace gcsFile.save(buffer, ...) with fs.createReadStream(tmpPath).pipe(gcsFile.createWriteStream({ ... })).

Reading scan metadata

The custom metadata you attach during upload is retrievable on every subsequent getMetadata() call:

const [metadata] = await bucket.file(gcsKey).getMetadata();
console.log(metadata.metadata['scan-status']);  // 'clean'
console.log(metadata.metadata['scanned-at']);   // ISO timestamp

This provides a lightweight audit trail without a separate database record.

IAM-based defence in depth

For an extra layer of defence, restrict who can upload to your bucket by granting only your application's service account the Storage Object Creator role, while your CDN or download service uses a separate identity with only Storage Object Viewer.

You can also use a Uniform bucket-level access policy to enforce that all objects are private by default, and only expose them via signed URLs that your application generates after verifying scan status.

# Grant only the app's service account write access
gcloud storage buckets add-iam-policy-binding gs://your-bucket \
  --member="serviceAccount:upload-api@your-project.iam.gserviceaccount.com" \
  --role="roles/storage.objectCreator"

Next steps