Article November 15, 2024

Secure File Upload Architecture for Healthcare, Finance, and Enterprise Applications

Design patterns for upload security in regulated environments: quarantine flows, audit trails, role-based policies, and how Pompelmi fits privacy-sensitive data processing.

architecture enterprise healthcare finance security privacy

Secure File Upload Architecture for Healthcare, Finance, and Enterprise Applications

Applications in healthcare, finance, and regulated enterprise contexts have stricter requirements than typical consumer apps. They handle PII, health records, financial documents, and legal materials — data where a security incident has regulatory, legal, and reputational consequences far beyond “user experience degraded.”

This post covers design patterns for upload security in these environments. We’ll focus on architecture, not regulatory interpretation (consult qualified counsel for compliance guidance). Pompelmi’s in-process, zero-egress scanning model fits naturally into these patterns.

Core Principles for Regulated Upload Pipelines

No Data Leaves Without Authorization

Every file upload that passes through a third-party service (cloud AV API, CDN without proper DPA, object storage in a non-approved region) represents a potential data handling event that may require consent, contracting, or notification obligations.

Pattern: Scan in-process. Use storage services with appropriate data processing agreements. Do not send files to external AV APIs without evaluating the data governance implications.

Minimum Necessary Processing

Process files only to the extent required for the stated purpose. Scanning for malware is required. Passing file content to analytics pipelines, logging systems, or training datasets is not.

Pattern: onScanEvent callbacks should log metadata (filename, size, verdict, matched rules) — never raw file content.

Audit Everything

Regulated environments require audit trails: who uploaded what, when, what the verdict was, and what action was taken.

Pattern: Every scan event becomes a structured, tamper-evident log entry.

Fail Closed

When the scanner encounters an error, the default behavior should be to block, not to pass. A broken scanner is not a reason to skip security checks.

Pattern: failClosed: true in all Pompelmi adapter options.

Reference Architecture

┌───────────────────────────────────────────────────────┐
│                      Client Layer                     │
│   Browser / Mobile / API Client                       │
└──────────────────────┬────────────────────────────────┘
                       │ multipart/form-data
┌──────────────────────▼────────────────────────────────┐
│                 Upload Gateway Service                │
│  - TLS termination                                    │
│  - Authentication / authorization checks             │
│  - File size limit enforcement                        │
│  - Rate limiting per user/IP                         │
└──────────────────────┬────────────────────────────────┘
                       │ Buffer in memory (no disk)
┌──────────────────────▼────────────────────────────────┐
│              In-Process Pompelmi Scanner              │
│  - Extension allowlist                                │
│  - Size check                                         │
│  - ZIP bomb detection (createZipBombGuard)           │
│  - Content heuristics (CommonHeuristicsScanner)       │
│  - Optional YARA rules                                │
│  - Emits structured audit events via onScanEvent      │
└──────────────────────┬────────────────────────────────┘
                       │ verdict: clean / suspicious / malicious
              ┌────────┴──────────┐
        'clean'                 'suspicious' or 'malicious'
              │                         │
┌─────────────▼──────────┐  ┌──────────▼──────────────┐
│  Staging Storage       │  │  Quarantine Storage      │
│  (temp, short TTL)     │  │  (isolated, audited)     │
└─────────────┬──────────┘  └──────────────────────────┘
              │
     [Async deep scan, optional]
     ClamAV / YARA / Manual review
              │
    ┌─────────▼──────────┐
    │  Permanent Storage │
    │  (authorized path) │
    └────────────────────┘

Implementation: Multi-Stage Upload Pipeline

Stage 1: Synchronous In-Process Scan (Express example)

import express from 'express';
import multer from 'multer';
import { createUploadGuard } from '@pompelmi/express-middleware';
import { composeScanners, CommonHeuristicsScanner, createZipBombGuard } from 'pompelmi';

const auditLogger = buildAuditLogger(); // your structured logger

const scanner = composeScanners(
  [
    ['zipGuard', createZipBombGuard({
      maxEntries: 500,
      maxTotalUncompressedBytes: 100 * 1024 * 1024,
      maxCompressionRatio: 50,
    })],
    ['heuristics', CommonHeuristicsScanner],
  ],
  { parallel: false, stopOn: 'malicious', timeoutMsPerScanner: 5000, tagSourceName: true }
);

const guard = createUploadGuard({
  includeExtensions: ['pdf', 'jpg', 'jpeg', 'png', 'docx', 'xlsx'],
  maxFileSizeBytes: 25 * 1024 * 1024,
  stopOn: 'suspicious',
  failClosed: true,
  scanner,
  onScanEvent: (ev: unknown) => {
    const event = ev as Record<string, unknown>;

    // Log metadata only — never file content
    auditLogger.info({
      source: 'pompelmi',
      type: event.type,
      filename: event.filename ? hashFilename(event.filename as string) : undefined, // optionally pseudonymize
      verdict: event.verdict,
      matches: event.matches,
      durationMs: event.ms,
      timestamp: new Date().toISOString(),
    });
  },
});

const upload = multer({
  storage: multer.memoryStorage(),
  limits: { fileSize: 25 * 1024 * 1024 },
});

app.post('/api/upload',
  authenticate,   // verify user identity first
  authorize,      // check upload permission
  upload.single('file'),
  guard,
  handleCleanUpload,
);

Stage 2: Quarantine for Suspicious Files

When stopOn: 'suspicious' would block too many benign files (e.g., during rollout), use a quarantine workflow instead:

// Instead of blocking suspicious files immediately, quarantine them
const guard = createUploadGuard({
  stopOn: 'malicious', // only hard-block confirmed malicious
  failClosed: true,
  scanner,
  onScanEvent: (ev: unknown) => {
    const event = ev as Record<string, unknown>;
    if (event.type === 'end' && event.verdict === 'suspicious') {
      quarantineQueue.enqueue({
        filename: event.filename,
        userId: req.user.id,
        uploadId: req.uploadId,
        reason: event.matches,
        timestamp: new Date().toISOString(),
      });
    }
  },
});

// Quarantined files go to an isolated bucket
// A separate review process (automated or manual) promotes or rejects them
async function handleUpload(req: AuthenticatedRequest, res: Response) {
  const { verdict } = (req as any).pompelmi;

  if (verdict === 'suspicious') {
    // File is quarantined, user gets a "pending review" response
    await quarantineStorage.moveToQuarantine(req.file, req.user.id);
    return res.status(202).json({
      status: 'pending_review',
      message: 'Your file is being reviewed. You will be notified when processing is complete.',
    });
  }

  // Clean: move to permanent storage
  await permanentStorage.store(req.file, req.user.id);
  res.json({ ok: true, fileId: generateFileId() });
}

Stage 3: Async Deep Scan (Optional)

For files that pass heuristics but warrant deeper analysis, enqueue an async job:

// After Stage 1 passes and file is in staging storage
await jobQueue.push({
  type: 'deep_scan',
  fileId,
  storageKey: stagingKey,
  userId: req.user.id,
  uploadedAt: new Date().toISOString(),
});

// Worker (runs on your infrastructure, not cloud API)
async function deepScanWorker(job: DeepScanJob) {
  const fileBytes = await stagingStorage.read(job.storageKey);

  // Run ClamAV if available
  if (clamAvScanner) {
    const clamResult = await clamAvScanner.scan(fileBytes);
    if (clamResult.verdict !== 'clean') {
      await quarantine(job, clamResult);
      return;
    }
  }

  // Run YARA with organization-specific rules
  if (yaraScanner) {
    const yaraResult = await yaraScanner.scan(fileBytes);
    if (yaraResult.matches.length > 0) {
      await quarantine(job, yaraResult);
      return;
    }
  }

  // Promote to permanent storage
  await permanentStorage.move(job.storageKey);
  await notifyUser(job.userId, 'file_ready', job.fileId);
}

Role-Based Upload Policies

Different users may have different upload permissions. Wire policies before the scanner:

function getGuardForRole(role: 'patient' | 'provider' | 'admin') {
  const base = {
    maxFileSizeBytes: role === 'admin' ? 100 * 1024 * 1024 : 10 * 1024 * 1024,
    stopOn: 'suspicious' as const,
    failClosed: true,
    scanner,
  };

  switch (role) {
    case 'patient':
      return createUploadGuard({
        ...base,
        includeExtensions: ['pdf', 'jpg', 'jpeg', 'png'], // limited set
      });

    case 'provider':
      return createUploadGuard({
        ...base,
        includeExtensions: ['pdf', 'jpg', 'jpeg', 'png', 'docx', 'xlsx', 'csv'],
      });

    case 'admin':
      return createUploadGuard({
        ...base,
        includeExtensions: ['pdf', 'jpg', 'png', 'docx', 'xlsx', 'csv', 'zip'],
      });
  }
}

Audit Log Requirements

A compliant audit log entry should include:

interface UploadAuditEntry {
  // Who
  userId: string;         // user or service account identifier
  sessionId?: string;     // session for correlation

  // What
  filename: string;       // hashed or pseudonymized in high-privacy contexts
  fileSize: number;       // bytes
  mimeType: string;       // declared MIME

  // When
  timestamp: string;      // ISO 8601

  // Result
  verdict: 'clean' | 'suspicious' | 'malicious';
  rules: string[];        // which rules fired
  action: 'accepted' | 'quarantined' | 'rejected';

  // Context
  ip?: string;            // hashed or omitted depending on jurisdiction
  userAgent?: string;
}

Important: Audit log entries should not contain file content. If your logging pipeline is compromised, you do not want uploaded files accessible via logs.

Summary

Regulated environments demand that upload security is auditable, predictable, and private. Pompelmi’s in-process scanning, structured onScanEvent callbacks, and failClosed semantics align with these requirements. The architecture pattern — synchronous heuristic scan → clean files to permanent storage, suspicious files to quarantine, rejected files blocked — provides a defensible, auditable pipeline.

Important note: This post describes technical architecture patterns, not legal or regulatory advice. Consult qualified privacy counsel and compliance professionals for HIPAA, GDPR, GLBA, and similar regulatory requirements specific to your jurisdiction and use case.

Resources:

Secure File Upload Architecture for Healthcare, Finance, and Enterprise Applications

Secure File Upload Architecture for Healthcare, Finance, and Enterprise Applications

Core Principles for Regulated Upload Pipelines

No Data Leaves Without Authorization

Minimum Necessary Processing

Audit Everything

Fail Closed

Reference Architecture

Implementation: Multi-Stage Upload Pipeline

Stage 1: Synchronous In-Process Scan (Express example)

Stage 2: Quarantine for Suspicious Files

Stage 3: Async Deep Scan (Optional)

Role-Based Upload Policies

Audit Log Requirements

Summary

Related articles

Fastify Upload Security: Scan Files Before They Reach Storage

Multer File Upload Security Checklist for Node.js

Secure File Uploads in NestJS with Application-Layer Scanning