Tracking Spatial Data Freshness SLAs

A spatial dataset can be ingested flawlessly and still be wrong, because it is late. A flood-extent layer that lands an hour after the operations team has already dispatched crews, a basemap tile refreshed a day past its acquisition window, a sensor feed whose newest partition predates the contractual update cadence — none of these throw an exception, yet all of them violate a freshness Service Level Agreement (SLA) that someone downstream is depending on. Tracking spatial data freshness SLAs means turning “how current is this layer, right now, at this point in the pipeline” into a deterministic, alertable measurement rather than a guess. This guide is written for the data engineers, GIS platform administrators, SREs, and compliance teams who carry that risk, and it sits under the broader Spatial Data Freshness & Quality Metrics program as the first quality gate every other check inherits its timing context from.

Architecture

Tracking spatial data freshness SLAs requires a decoupled observability plane that operates parallel to, rather than embedded within, your core ETL/ELT pipelines. The ingestion layer must emit structured telemetry at three deterministic boundaries: source extraction, spatial transformation, and catalog publication. At each boundary, a lightweight metadata sidecar intercepts payloads to capture partition timestamps, spatial bounding boxes, row counts, and coordinate reference system (CRS) identifiers before the dataset enters the transformation engine. These signals are serialized via OpenTelemetry and routed to a dedicated time-series observability store, while the geospatial assets themselves flow directly into object storage or spatial databases (e.g., PostGIS, GeoParquet on S3). Aligning the sidecar’s tags with the Geospatial Metric Taxonomy for ETL keeps a freshness signal emitted from an Airflow operator and one emitted from a PostGIS trigger inside the same attribute namespace.

This sidecar pattern guarantees that freshness tracking introduces zero compute contention during resource-intensive operations like raster tiling, vector topology validation, or spatial indexing. By isolating the observability plane, GIS platform administrators can scale spatial workloads horizontally, while SREs maintain strict SLA visibility without querying production tables. The three emission boundaries map directly onto the trust transitions described in Defining Spatial Data Trust Boundaries — source, staging, and curated — so that a freshness number is always attributable to a specific guarantee level rather than to “the pipeline” as a whole. A centralized freshness registry aggregates temporal signals across heterogeneous sources (satellite feeds, IoT sensors, municipal shapefiles), enabling compliance teams to audit update cadences and SLA adherence without direct database access.

Freshness is the earliest gate in the quality chain, so its output feeds every later stage. A partition that clears the freshness SLA still has to pass Coordinate Reference System Validation and Geometry Validity & Topology Checks before publication — and when those later checks stall, their latency must be attributed back to the freshness budget rather than silently inflating it. When a source feed misses its window entirely, the gate degrades through the tiers defined in Fallback Chains for Spatial API Failures instead of publishing a stale layer as if it were current.

Metric Specification

Geospatial freshness cannot be accurately represented by a single last_updated timestamp. The metric layer must compute a composite freshness score that synthesizes ingestion latency, expected update windows, and spatial-temporal alignment. The core signal is spatial_freshness_lag_seconds, the delta between the source event time and catalog availability time — kept strictly separate, because a tile ingested in seconds is still stale if its acquisition time predates the SLA window. The remaining signals contextualize that lag so a single number never hides a window breach, a partition gap, or a CRS-validation stall.

Metric	Instrument	Unit	Key dimensions
`spatial_freshness_lag_seconds`	gauge	seconds	`jurisdiction`, `source_type`, `srid`
`update_window_compliance_ratio`	gauge	ratio	`jurisdiction`, `sla_tier`, `cadence`
`temporal_partition_drift`	gauge	seconds	`jurisdiction`, `partition_key`
`crs_validation_latency`	histogram	seconds	`srid`, `source_type`
`row_count_delta`	gauge	features	`jurisdiction`, `partition_key`

These combine into a composite freshness health score spatial_freshness.health_score, bounded to [0, 1] where 1.0 means every partition is inside its SLA window with no drift:

F_{\text{score}} = w_l\left(1 - \min\!\left(1, \frac{L}{T_{\text{sla}}}\right)\right) \;+\; w_c\, C_{\text{win}} \;+\; w_d\left(1 - \min\!\left(1, \frac{|D_{\text{part}}|}{T_{\text{sla}}}\right)\right)

where $L$ is spatial_freshness_lag_seconds, $T_{\text{sla}}$ is the contractual SLA window for that dataset, $C_{\text{win}}$ is update_window_compliance_ratio, $D_{\text{part}}$ is temporal_partition_drift, and the weights $w_l, w_c, w_d$ sum to 1 (defaults 0.5 / 0.3 / 0.2, weighting raw lag most heavily). For time-series GIS workloads, temporal baseline alignment is critical: the system normalizes all metrics to a common resolution (typically 1-minute or 5-minute intervals) and attaches spatial extent tags, which is what lets the Temporal Baseline Alignment for Time-Series GIS layer compare cadence across heterogeneous feeds. This same extent tagging supports Spatial Coverage & Extent Monitoring by enabling region-aware alerting, and the row_count_delta signal is reconciled against Automated Row Count & Attribute Sync so that a silently truncated partition can never register as fresh-and-compliant.

Pipeline Integration & Configuration

Deploying the freshness tracking layer requires explicit configuration across the telemetry collector, metric storage, and pipeline agents. Below are production-ready configurations that integrate with existing spatial workflows.

OpenTelemetry Collector Configuration

Route pipeline sidecar telemetry to a centralized backend while filtering out high-cardinality payload hashes (the filter processor ships with the contrib collector build):

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

processors:
  batch:
    timeout: 10s
    send_batch_size: 1000
  filter:
    metrics:
      include:
        match_type: strict
        metric_names:
          - spatial_freshness_lag_seconds
          - update_window_compliance_ratio
          - temporal_partition_drift

exporters:
  prometheusremotewrite:
    endpoint: "https://prometheus.internal/api/v1/write"
    resource_to_telemetry_conversion:
      enabled: true

service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [batch, filter]
      exporters: [prometheusremotewrite]

Prometheus Recording Rules

Pre-aggregate freshness metrics so dashboards and alerts evaluate cheaply against a sampled signal rather than the raw stream:

groups:
  - name: spatial_freshness_sla
    rules:
      - record: job:spatial_freshness_lag_seconds:avg_5m
        expr: avg_over_time(spatial_freshness_lag_seconds[5m])

      - record: job:update_window_compliance_ratio:current
        expr: >
          sum(increase(update_window_compliance_ratio[1h]))
          / count(update_window_compliance_ratio)

Pipeline Metadata Extraction (Python)

Inject freshness telemetry directly into transformation scripts using pyproj and gdal, separating source event time from catalog time so lag is never conflated with ingestion duration:

import os
from datetime import datetime, timezone
from opentelemetry import metrics
from pyproj import CRS
from osgeo import gdal

meter = metrics.get_meter("spatial_freshness_tracker")
freshness_counter = meter.create_counter("spatial_freshness_lag_seconds")

def emit_freshness_telemetry(source_path: str, expected_crs: str):
    gdal.UseExceptions()
    dataset = gdal.Open(source_path)
    if dataset is None:
        raise RuntimeError(f"Failed to open {source_path}")

    # Source EVENT time (acquisition), not ingestion time — the SLA is measured against this
    acq_time_str = dataset.GetMetadataItem("ACQUISITION_TIME")
    if acq_time_str is None:
        raise ValueError(f"ACQUISITION_TIME metadata missing in {source_path}")
    source_ts = datetime.fromisoformat(acq_time_str)
    catalog_ts = datetime.now(timezone.utc)
    lag = (catalog_ts - source_ts.replace(tzinfo=timezone.utc)).total_seconds()

    # Validate CRS alignment so a projection stall is attributed, not absorbed into lag
    wkt = dataset.GetProjection()
    if wkt:
        ds_crs = CRS.from_wkt(wkt)
        expected = CRS.from_string(expected_crs)
        crs_match = ds_crs.to_epsg() == expected.to_epsg()
    else:
        crs_match = False

    freshness_counter.add(lag, {
        "jurisdiction": os.environ.get("JURISDICTION", "global"),
        "crs": str(ds_crs.to_epsg()) if wkt else "unknown",
        "source_type": "raster",
        "crs_valid": str(crs_match)
    })
    dataset = None

When topology validation bottlenecks occur, they directly impact freshness metrics. Implementing automated checks as documented in How to automate geometry validity checks in GDAL ensures that validation latency is explicitly measured rather than absorbed into generic pipeline lag, keeping the freshness budget honest.

Threshold Design & Alerting Logic

Thresholds are tiered so that ordinary cadence jitter never pages a human while a genuine SLA breach halts propagation immediately. Spatial freshness is non-linear: a satellite feed that is two hours late during a scheduled overpass gap is benign, whereas the same lag on a real-time vehicle-tracking layer is an outage. The dynamic baseline tier therefore compares each feed against its own rolling cadence rather than a global constant.

Severity	`spatial_freshness_lag_seconds` (vs SLA window `T`)	`update_window_compliance_ratio`	Action
WARNING	`> 1.5 × T`	`0.95 – 0.99`	Route to data-engineering queue, annotate dashboard
CRITICAL	`> 2.0 × T`	`< 0.95`	Halt downstream materialization, page SRE/GIS Ops
DYNAMIC_BASELINE	`> baseline + 2σ` (30-day EMA)	adapts per feed	Re-evaluate against seasonal cadence before paging

These translate into PromQL alerts evaluated against the recorded series, with labels carrying jurisdiction so the page names the affected region:

groups:
  - name: spatial_freshness.alerts
    rules:
      - alert: SpatialFreshnessSLABreach
        expr: job:spatial_freshness_lag_seconds:avg_5m > 7200
        for: 10m
        labels: { severity: critical, team: sre-gis }
        annotations:
          summary: "Freshness SLA breached for {{ $labels.job }} in {{ $labels.jurisdiction }}"
          description: "Lag exceeds 2 hours. Verify upstream extraction and CRS validation steps."

      - alert: SpatialFreshnessDynamicBaseline
        # seasonal feeds: lag must hold within 2σ of the feed's 30d EMA
        expr: |
          spatial_freshness_lag_seconds
            > (avg_over_time(spatial_freshness_lag_seconds[30d])
               + 2 * stddev_over_time(spatial_freshness_lag_seconds[30d]))
        for: 15m
        labels: { severity: warning, team: sre-gis }
        annotations:
          summary: "Freshness lag above dynamic baseline for {{ $labels.jurisdiction }}"

For advanced alert routing and metric retention strategies, consult the official Prometheus recording rules documentation and the OpenTelemetry Collector configuration guide.

Failure Modes & Edge Cases

CRS validation latency masquerading as freshness lag. spatial_freshness_lag_seconds exceeds 2× SLA for a specific jurisdiction, but the source actually arrived on time. The cause is a slow or failing projection check upstream — a missing or wrong SRID that stalls the transformation stage. Diagnose by querying crs_validation_latency alongside the lag series; if validation time accounts for the excess, the fix belongs in Coordinate Reference System Validation, not in the extraction schedule.
Silent data loss masked as compliance. update_window_compliance_ratio holds at 1.0 while downstream analytics report missing features. A zero-lag ingestion of an empty or truncated partition registers as perfectly fresh. Cross-reference row_count_delta: a current timestamp with a sharply negative row delta is the signature. Enforce a minimum-row guard so an on-time empty partition is treated as a breach, not a success.
Dynamic baseline drift on seasonal feeds. Cloud-cover delays, satellite pass schedules, or maintenance windows trigger false CRITICAL alerts during expected latency. Switch the rule to the 30-day EMA baseline (baseline + 2σ) and exclude documented maintenance windows with absent() and on() join logic so a known overpass gap never pages.
Event-time vs ingestion-time conflation. When the sidecar reads now() instead of the dataset’s acquisition metadata, lag collapses to near-zero and every stale layer looks fresh. Verify the extractor parses ACQUISITION_TIME (or the equivalent NetCDF/GeoTIFF header) and falls back to a deterministic parser rather than wall-clock time when the field is absent.
Aggregation hiding per-partition staleness. A jurisdiction-wide average lag stays green while one administrative boundary is hours behind, because the fresh majority dilutes the stale minority. Dimension the lag gauge by partition_key and alert on the worst partition, not the mean, so localized staleness surfaces.

Troubleshooting Checklist

Confirm event time first. Verify the sidecar is reading source acquisition time, not ingestion time — a wall-clock fallback makes every layer look fresh and invalidates the whole SLA.
Split lag from validation latency. Query crs_validation_latency and temporal_partition_drift against the same window before touching the extraction schedule; a projection stall diagnosed as a source delay wastes the remediation cycle.
Cross-check cardinality. Compare row_count_delta against the partition’s 7-day average so an on-time empty partition is caught rather than passed as compliant.
Evaluate per partition, not per feed. Re-run the lag query dimensioned by partition_key to expose localized staleness hidden under a healthy jurisdiction average.
Distinguish seasonal latency from a breach. For satellite or IoT feeds, switch to the dynamic EMA baseline and confirm the lag truly exceeds baseline + 2σ before paging.
Document and exclude maintenance windows. Record expected latency windows in the freshness registry and gate the alert with absent()/on() joins so planned gaps do not generate noise that erodes trust in the page.

By isolating the observability plane from spatial compute and separating event time from ingestion time at every boundary, freshness becomes an auditable, region-aware signal — data engineers, GIS administrators, SREs, and compliance teams share a single view of which layers are inside their SLA window and which have quietly fallen behind.

Spatial Data Freshness & Quality Metrics — the parent guide this freshness gate reports into.
Temporal Baseline Alignment for Time-Series GIS — normalizes heterogeneous feeds onto the common cadence freshness scoring depends on.
Automated Row Count & Attribute Sync — the cardinality reconciliation that stops a truncated partition from registering as fresh.
Coordinate Reference System Validation — the datum gate whose validation latency must be attributed out of the freshness budget.
How to automate geometry validity checks in GDAL — the deep dive on measuring validation latency rather than absorbing it into lag.