Courier MFT

File Monitor System

Watch directories and trigger jobs automatically when files arrive.

The File Monitor system watches local and remote directories for file activity and triggers job executions in response. Monitors are first-class entities, independent of jobs, with their own configuration, lifecycle, and audit history.

9.1 Monitor Entity

A File Monitor is persisted in the database with the following configuration:

  • Name & description: Human-readable identification
  • Watch target: Local directory path or remote connection reference (SFTP/FTP) + remote path
  • Trigger events: One or more of FileCreated, FileModified, FileExists
  • File patterns: One or more glob patterns for filename filtering (e.g., *.pgp, invoice_*.csv)
  • Polling interval: For remote monitors and as local fallback (default: 60 seconds)
  • Stability window: Time to wait for file size to stabilize before considering a file "ready" (default: 5 seconds local, 1 poll interval remote)
  • Bound jobs/chains: One or more Job or Chain references to trigger on detection
  • State: Active, Paused, Disabled, or Error
  • Max consecutive failures: Threshold before transitioning to Error state (default: 5)

9.2 Local Monitoring — FileSystemWatcher + Polling Fallback

For local and mounted directories, Courier uses .NET's FileSystemWatcher for near-instant event detection, combined with a periodic full-directory poll as the authoritative source of truth. The core invariant is: the watcher is an optimization for latency; the poller is the guarantee of correctness. If the watcher drops events, the poller catches them. If the watcher fails entirely, the poller continues alone.

Design principle: No file event is ever delivered by the watcher alone. Every detected file must survive reconciliation against the poller's full-directory scan to be considered actionable. The watcher's role is to reduce detection latency between poll intervals, not to serve as the primary detection mechanism.

Known FileSystemWatcher problems and mitigations:

  • Buffer overflow / dropped events: Microsoft documents that the internal buffer can overflow under high file volume, silently dropping events with no notification to the application. Mitigation: set InternalBufferSize to 64KB (up from 8KB default), and handle the Error event to detect buffer overflows. Even with 64KB, a sustained burst of ~4,000+ events between drain cycles will overflow the buffer. The periodic poller (default: every 5 minutes) detects any files missed by the watcher.
  • Duplicate events: A single file write can fire multiple Changed events. Mitigation: debounce events per file using a short window (500ms). Only the last event in the debounce window triggers processing.
  • Premature events: Events fire when a file is first opened for writing, not when writing completes. Mitigation: file readiness detection (see 9.4).

Poller as authoritative source:

On each poll cycle, the poller performs a full Directory.EnumerateFiles() scan and compares the result against monitor_directory_state. Any file present in the directory but not yet processed (or modified since last processing) is treated as a new event, regardless of whether the watcher saw it. This means:

  • If the watcher fires and the poller confirms the file, processing proceeds (typical fast path)
  • If the watcher misses an event (buffer overflow, race condition, OS bug), the poller catches it within one poll interval
  • If the watcher fires but the file disappears before the poller confirms it, the event is discarded (file was transient)

Watcher health monitoring:

Each local monitor tracks watcher health metrics, exposed via the Monitor API and stored in the monitor's state:

MetricDescriptionSource
watcher_stateActive, Degraded, Disabled, FailedRuntime state
last_overflow_atTimestamp of last Error event from FileSystemWatcherError event handler
overflow_count_24hNumber of buffer overflows in the last 24 hoursRolling counter
last_poll_atTimestamp of last completed full-directory pollPoll completion
last_poll_duration_msDuration of the last full-directory scanTimer
last_poll_file_countNumber of files found in last pollPoll result
watcher_events_since_last_pollEvents received between polls (for overflow detection)Counter reset per poll

Automatic watcher disable:

If the watcher becomes a liability (constant overflows, excessive event volume), it is automatically disabled and the monitor falls back to polling-only:

  • Overflow threshold: If overflow_count_24h exceeds monitor.watcher_overflow_threshold (default: 10), the watcher is disabled for that monitor and the state transitions to Degraded. The poller continues at normal interval. An audit event WatcherAutoDisabled is emitted with the overflow count and monitor ID.
  • File count threshold: If last_poll_file_count exceeds monitor.watcher_max_file_count (default: 10,000), the watcher is disabled preemptively. Directories with very high file counts generate more filesystem events than the watcher can reliably buffer. The monitor operates in poll-only mode.
  • Manual re-enable: Admins can re-enable the watcher via POST /api/v1/monitors/\{id\}/reset-watcher after addressing the root cause (e.g., reducing file count, increasing poll frequency). The monitor transitions back to Active.
  • Poll interval adjustment: When the watcher is disabled (state: Degraded), the poll interval is automatically halved (e.g., 5 min → 2.5 min) to compensate for the loss of real-time detection. The original interval is restored when the watcher is re-enabled.

The watcher and polling fallback run as a combined system. If the FileSystemWatcher encounters an error (e.g., watched directory becomes unavailable), the system falls back to polling-only until the watcher can be re-established.

9.3 Remote Monitoring — Scheduled Polling

Remote SFTP and FTP directories are monitored via scheduled polling to respect connection limits and avoid holding persistent sessions:

  1. On each poll interval, the monitor opens a short-lived connection using the configured connection reference
  2. Lists the directory contents (filename, size, last modified)
  3. Compares against the last known state stored in the monitor_directory_state table
  4. Identifies new, modified, or existing files based on the configured trigger events
  5. Disconnects immediately after listing

Connection management: Remote polls use the same Connection & Protocol Layer (Section 6) for SFTP/FTP access. Connections are opened and closed per poll — no persistent sessions. The polling interval should be set conservatively (minimum 30 seconds, recommended 60+ seconds) to avoid hitting server-side rate limits or max-session restrictions.

State tracking: After each poll, the full directory listing (filenames, sizes, timestamps) is persisted so the next poll can compute a diff. This state is stored in monitor_directory_state keyed by monitor ID.

9.4 File Readiness Detection

To prevent triggering on partially-written files, the monitor applies a stability check before considering a file "ready":

Local files:

  1. When a file event is detected (or found during poll), record the file size
  2. Wait for the configured stability window (default: 5 seconds)
  3. Check the file size again
  4. If the size has not changed, the file is considered ready and the trigger fires
  5. If the size changed, reset the stability window and check again

Remote files:

  1. On poll N, a new file is detected with size S
  2. On poll N+1 (one interval later), the file is checked again
  3. If the size matches and last-modified timestamp is unchanged, the file is considered ready
  4. If either changed, it remains in a "pending readiness" state until the next poll confirms stability

This approach handles the common case of large files being uploaded to a watched directory over SFTP. The stability window is configurable per monitor for environments with very slow uploads.

9.5 File Pattern Filtering

Each monitor can define one or more glob patterns that filenames must match to trigger an event. Patterns are evaluated against the filename only (not the full path).

  • Patterns use standard glob syntax: * matches any characters, ? matches a single character
  • Multiple patterns are OR'd — a file matching any pattern triggers the event
  • An empty pattern list means all files match
  • Patterns are case-insensitive on Windows, case-sensitive on Linux (matching OS filesystem behavior)

Examples: *.pgp, invoice_*.csv, PARTNER_??_*.dat

9.6 Monitor → Job Binding & Context Injection

When a monitor triggers, it creates a new execution of each bound Job or Chain and injects the file information into the JobContext:

{
    "monitor.id": "a1b2c3d4-...",
    "monitor.name": "Partner Invoice Watch",
    "monitor.trigger_event": "FileCreated",
    "monitor.triggered_files": [
        {
            "path": "/data/incoming/invoice_2026-02-20.pgp",
            "size": 1048576,
            "last_modified": "2026-02-20T03:15:22Z"
        }
    ],
    "monitor.triggered_at": "2026-02-20T03:15:30Z"
}

Job steps can reference these context values in their configuration. For example, a pgp.decrypt step could reference monitor.triggered_files[0].path as its input file. This binding is configured in the job step definition using the same JobContext reference syntax described in Section 5.6.

If multiple files match the pattern in a single detection cycle, the behavior depends on monitor configuration:

  • Batch mode (default): All matched files are injected into a single job execution as a list. Steps that support multi-file input process them all.
  • Individual mode: A separate job execution is created for each matched file. Useful when each file needs independent processing and audit tracking.

9.7 Deduplication

The monitor maintains a monitor_file_log table that tracks which files have already triggered events to prevent duplicate job executions:

ColumnTypeDescription
monitor_idUUIDFK to the monitor
file_pathTEXTFull path of the detected file
file_sizeBIGINTSize in bytes at time of trigger
file_hashTEXTOptional SHA-256 hash for content-based dedup
last_modifiedTIMESTAMPFile's last modified timestamp
triggered_atTIMESTAMPWhen the trigger fired
execution_idUUIDFK to the job execution that was created

On each detection cycle, the monitor checks this log before triggering:

  • FileCreated: Triggers only if the file path has never been seen, or if the file was previously processed and has since been deleted and recreated (detected by absence in a previous poll followed by presence)
  • FileModified: Triggers if the file's size or last-modified timestamp differs from the last recorded values
  • FileExists: Triggers on every detection cycle where the file is present (no dedup — useful for presence-check workflows)

The file log is pruned periodically to remove entries older than a configurable retention period (default: 30 days).

9.8 Monitor State Machine

    ┌────────┐
    │ Active │◄──── (activate / resume)
    └──┬──┬──┘
       │  │
       │  └───────────┐ (N consecutive failures)
       │              │
       │ (pause)  ┌───▼───┐
       │          │ Error │── (acknowledge & resume) ──► Active
       │          └───────┘
  ┌────▼────┐
  │ Paused  │
  └────┬────┘
       │ (disable)
  ┌────▼─────┐
  │ Disabled │
  └──────────┘
  • Active: Monitor is running. Watching for events (local) or polling (remote).
  • Degraded: Monitor is running in poll-only mode. FileSystemWatcher has been auto-disabled due to excessive overflows or high file count (Section 9.2). Functionally equivalent to Active but with higher detection latency (poll interval halved to compensate).
  • Paused: Monitor is temporarily stopped. Configuration is retained. Can be resumed without re-creation.
  • Disabled: Monitor is permanently stopped. Must be explicitly re-activated. Used for monitors that are no longer needed but whose configuration and history should be preserved.
  • Error: Monitor encountered repeated failures (e.g., watched directory deleted, SFTP server unreachable for N consecutive polls). Requires manual acknowledgment to resume. An error event is emitted for V2 alerting.

9.9 Error Handling & Resilience

Local monitor errors:

  • Watched directory deleted or unmounted: FileSystemWatcher raises an Error event. The monitor logs the error, disables the watcher (state: Failed), and operates in poll-only mode. If the directory remains unavailable for N consecutive polls, the monitor transitions to Error.
  • Buffer overflow: FileSystemWatcher raises an Error event with InternalBufferOverflowException. The overflow is counted toward overflow_count_24h. If the threshold is exceeded, the watcher auto-disables (Section 9.2). The current poll cycle catches any missed files.
  • Permission denied: Logged and counted as a failure toward the consecutive failure threshold.

Remote monitor errors:

  • Connection refused / timeout: The poll is logged as a failure. The next poll interval retries with a fresh connection.
  • Authentication failure: Treated as a critical error — single occurrence transitions the monitor to Error since retrying won't help without credential changes.
  • Directory not found: Logged as a failure and counted toward the threshold.

All errors are recorded in the monitor's audit log with timestamps, error details, and the current consecutive failure count.