Spatial Data Freshness Quality Metrics

Geospatial pipelines fail under a distinct paradigm: silent degradation. Unlike transactional systems that immediately reject constraint violations, spatial workflows routinely absorb corrupted geometries, mismatched coordinate reference systems, and delayed ingestion windows without raising a single infrastructure alarm. These defects propagate downstream, surfacing as broken spatial joins, misaligned field operations, drifted basemaps, and regulatory non-compliance long after the root cause has scrolled out of the logs. This guide is written for the data engineers, GIS platform administrators, SREs, and compliance teams who own that risk. It defines the deterministic metrics, instrumentation patterns, alerting thresholds, and failure-mode runbooks needed to make freshness and quality observable across an enterprise spatial stack — and it builds directly on the instrumentation foundation laid out in Geospatial Observability Architecture & Fundamentals, which covers trust boundaries and collector topology at the platform level.

The scope here is the measurement plane: how to quantify how current and how correct a spatial dataset is at every stage from source extraction to the curated analytical layer. Each topic area introduced below has its own in-depth guide, and the first mention of each links straight to it.

Core Concepts & Trust Boundaries

Spatial observability rests on a small set of orthogonal quality dimensions. Treating them as independent, individually instrumented signals — rather than collapsing them into a single “data looks fine” health check — is what separates a pipeline that fails loudly from one that fails silently.

Freshness is not a simple now() - max(timestamp) calculation. It is a composite of source event cadence, partition availability, downstream dependency windows, and catalog publication latency. A satellite tile can be ingested within seconds yet still be stale if its acquisition time predates the SLA window. Measuring it correctly — separating event time from ingestion time and weighting by business criticality — is the subject of Tracking Spatial Data Freshness SLAs.

Geometry validity is the structural correctness of each feature: no self-intersections, correct ring orientation, no duplicate or degenerate nodes, no empty geometries masquerading as valid. Invalid primitives silently corrupt spatial indexes and cause queries to return wrong results rather than errors. The validate-repair-quarantine pattern that enforces this at ingest is detailed in Geometry Validity & Topology Checks.

Coordinate reference system (CRS) consistency governs whether two layers actually align in space. A dataset stored in EPSG:4326 but joined against an EPSG:3857 layer produces offset joins, nonsensical distances, and broken buffers — with no exception thrown. Enforcing SRID contracts at the schema and transformation boundary is covered in Coordinate Reference System Validation.

Spatial coverage and extent describe completeness in space: missing tiles in a raster mosaic, shrunken bounding boxes in a vector feed, or index fragmentation that degrades queries from logarithmic to linear time. Tracking minimum-bounding-rectangle drift and tile-completeness ratios is the focus of Spatial Coverage & Extent Monitoring.

Attribute and row-count integrity captures the non-geometric payload: silent column drops, null propagation, mismatched primary keys, and schema drift introduced by third-party feeds. Hash-based reconciliation across raw, staging, and curated layers is described in Automated Row Count & Attribute Sync.

Temporal baseline alignment matters whenever the data is a time series — land-cover change, urban expansion, environmental monitoring. Inconsistent timestamp granularity or gaps in temporal resolution bias change-detection algorithms. Normalizing heterogeneous feeds onto a common cadence is the topic of Temporal Baseline Alignment for Time-Series GIS.

These six dimensions meet at trust boundaries — the explicit points where a dataset transitions from one level of guarantee to another (source → staging → curated). Each boundary is where measurement is cheapest and most meaningful. The canonical attribute namespace those boundaries emit is defined by the Geospatial Metric Taxonomy for ETL, and the rejection logic at the earliest boundary follows the rules in Defining Spatial Data Trust Boundaries.

A freshness gate at the first boundary is typically a rolling SQL query over partitioned spatial tables, applying tiered thresholds by business criticality:

-- Rolling freshness with tiered SLA status, per dataset partition
WITH ingestion_metrics AS (
  SELECT
    dataset_id,
    MAX(ingestion_timestamp) AS latest_ingest,
    MAX(event_timestamp)     AS latest_event,
    COUNT(DISTINCT partition_key) AS active_partitions
  FROM spatial_catalog.raw_partitions
  WHERE ingestion_timestamp >= CURRENT_DATE - INTERVAL '7 days'
  GROUP BY dataset_id
)
SELECT
  dataset_id,
  EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - latest_event)) / 3600.0 AS lag_hours,
  CASE
    WHEN EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - latest_event)) / 3600.0 > 24 THEN 'CRITICAL'
    WHEN EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - latest_event)) / 3600.0 > 12 THEN 'WARNING'
    ELSE 'HEALTHY'
  END AS sla_status,
  CASE WHEN active_partitions = 0 THEN 'NO_DATA' ELSE 'ACTIVE' END AS partition_health
FROM ingestion_metrics;

Metric Taxonomy

Observable spatial pipelines emit a small, stable vocabulary of metrics under the gis.* and spatial_freshness_* namespaces. Naming them consistently is what lets a single Grafana panel or alert rule generalize across satellite, IoT, and cadastral sources. The table below is the canonical reference for this measurement domain; each entry maps to one of the six quality dimensions above.

Metric	Type	Unit	Key dimensions	What it captures
`spatial_freshness_lag_seconds`	gauge	seconds	`dataset_id`, `srid`, `jurisdiction`	Source event time → catalog availability
`spatial_freshness_score`	gauge	ratio 0–1	`dataset_id`	Weighted composite of lag vs SLA window
`gis.etl.partitions_active`	gauge	count	`dataset_id`	Partitions visible in the current window
`gis.spatial.geometry_invalid_total`	counter	features	`layer`, `error_type`	`ST_IsValid` / topology failures at ingest
`gis.spatial.crs_mismatch_total`	counter	features	`expected_srid`, `observed_srid`	SRID contract violations
`gis.spatial.coverage_ratio`	gauge	ratio 0–1	`dataset_id`	Tiles present ÷ tiles expected
`gis.spatial.extent_drift_meters`	gauge	meters	`dataset_id`	MBR shrink/grow vs baseline
`gis.etl.attribute_drift_total`	counter	rows	`table`, `column`	Hash mismatches between layers
`gis.spatial.temporal_gap_seconds`	histogram	seconds	`region_id`	Inter-observation gap distribution

Counters answer “how many failures since reset”, gauges answer “what is the current level”, and histograms preserve the distribution needed for p95/p99 thresholds where spatial workloads are non-linear (one pathological high-vertex polygon can dominate a mean).

The composite freshness score collapses per-source lag against per-source SLA windows, weighted by business criticality, into a single 0–1 health figure. Let $\ell_i$ be the observed lag for source $i$ , $\tau_i$ its contractual SLA window, and $w_i$ its criticality weight:

S_{\text{fresh}} = 1 - \min\!\left(1,\ \frac{\sum_{i=1}^{n} w_i \,\ell_i}{\sum_{i=1}^{n} w_i \,\tau_i}\right)

A score of 1.0 means every weighted source is comfortably inside its window; 0.0 means the weighted lag has reached or exceeded the aggregate SLA budget. Alert thresholds for non-stationary feeds (seasonal sensors, tidal imagery) should not be static — derive them from a rolling p-quantile of recent lag with a tolerance factor $\gamma$ :

\theta_{p} = Q_p\big(\ell_{\,t-30\text{d}\,\ldots\,t}\big)\,(1 + \gamma)

This DYNAMIC_BASELINE threshold adapts to legitimate seasonal variation while still catching genuine regressions, and it composes cleanly with the static WARNING/CRITICAL tiers defined per dataset.

Instrumentation Patterns

Freshness and quality signals should be emitted as first-class telemetry, not scraped from application logs after the fact. The reference deployment runs an OpenTelemetry Collector (contrib build) that attaches spatial dimensions to every datapoint, drops low-value scratch metrics with a filter processor, and tail-samples traces so that topology-error and CRS-drift spans are always retained while routine work is sampled down. The end-to-end wiring of collectors into GIS pipelines is covered in OpenTelemetry Integration for GIS Pipelines; the configuration below is the freshness-and-quality slice of that pipeline.

# otel-collector-contrib: spatial freshness + quality telemetry
receivers:
  otlp:
    protocols:
      grpc:
      http:

processors:
  # Drop scratch-tier coverage metrics before they reach long-term storage
  filter/spatial:
    error_mode: ignore
    metrics:
      metric:
        - 'name == "gis.spatial.coverage_ratio" and resource.attributes["dataset.tier"] == "scratch"'
  # Promote resource-level spatial context onto every datapoint
  transform/spatial_tags:
    metric_statements:
      - context: datapoint
        statements:
          - set(attributes["srid"], resource.attributes["gis.srid"])
          - set(attributes["jurisdiction"], resource.attributes["gis.jurisdiction"])
  # Tail-sample: never drop topology errors or CRS drift; baseline-sample the rest
  tail_sampling/spatial:
    decision_wait: 10s
    policies:
      - name: keep-topology-errors
        type: string_attribute
        string_attribute: { key: gis.geometry.valid, values: ["false"] }
      - name: keep-crs-drift
        type: string_attribute
        string_attribute: { key: gis.crs.drift, values: ["true"] }
      - name: baseline-sample
        type: probabilistic
        probabilistic: { sampling_percentage: 5 }

exporters:
  prometheusremotewrite:
    endpoint: https://prom.internal/api/v1/write
  otlp/traces:
    endpoint: tempo.internal:4317

service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [filter/spatial, transform/spatial_tags]
      exporters: [prometheusremotewrite]
    traces:
      receivers: [otlp]
      processors: [tail_sampling/spatial]
      exporters: [otlp/traces]

On the producer side, transformation jobs record metrics through the OpenTelemetry SDK at each trust boundary. The snippet below registers the freshness gauge and attaches the spatial dimensions that the collector expects:

from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter

reader = PeriodicExportingMetricReader(
    OTLPMetricExporter(endpoint="otel-collector:4317")
)
metrics.set_meter_provider(MeterProvider(metric_readers=[reader]))
meter = metrics.get_meter("gis.freshness")

freshness_lag = meter.create_gauge(
    "spatial_freshness_lag_seconds",
    description="Source event time to catalog availability",
    unit="s",
)

def record_freshness(dataset_id: str, srid: int, jurisdiction: str, lag_seconds: float):
    # Dimensions must match the collector's transform/spatial_tags keys
    freshness_lag.set(
        lag_seconds,
        {"dataset_id": dataset_id, "srid": str(srid), "jurisdiction": jurisdiction},
    )

CRS validation is the one check worth running before metrics are even recorded, because a projection mismatch poisons every spatial measurement taken after it. A pre-ingest gate using geopandas halts the job rather than emitting misleading lag numbers:

import geopandas as gpd

EXPECTED_SRID = 32633  # WGS 84 / UTM zone 33N

def validate_crs(gdf: gpd.GeoDataFrame, expected_srid: int = EXPECTED_SRID) -> gpd.GeoDataFrame:
    if gdf.crs is None:
        raise ValueError("Input GeoDataFrame lacks CRS metadata. Rejecting ingest.")
    if gdf.crs.to_epsg() != expected_srid:
        gdf = gdf.to_crs(epsg=expected_srid)
        if gdf.crs.to_epsg() != expected_srid:
            raise RuntimeError(
                f"CRS transformation failed precision check. Expected EPSG:{expected_srid}"
            )
    return gdf

Multi-Region & Scale Considerations

At petabyte scale and across regulatory geographies, a single central collector becomes both a bottleneck and a data-sovereignty liability. The reference topology runs regional edge collectors co-located with each spatial store: telemetry is aggregated and pre-sampled at the edge, and only the resulting metrics (not raw feature payloads) cross regional boundaries. This keeps cadastral and personally-identifying spatial data resident in its jurisdiction while still giving global SRE dashboards a unified view of freshness and quality. The cross-region topology mechanics — replication lag measured at the feature level, edge-collector failover, and split-brain detection — are detailed in Monitoring Topology for Multi-Region GIS.

Two scale-specific effects deserve explicit instrumentation. First, replication lag is spatial: a region can be globally “caught up” by row count yet missing an entire tile column because a mosaic re-fetch stalled, so gis.spatial.coverage_ratio must be evaluated per region, not just globally. Second, validation cost is non-linear in vertex count: high-vertex polygons (coastlines, administrative boundaries) dominate ST_IsValid CPU time, so per-feature validation latency should be sampled and bucketed rather than averaged. Scoping which datasets get synchronous validation versus asynchronous sampling follows the Observability Scoping Rules for Vector Data.

To avoid contention on production tables, run quality measurement on a decoupled observability plane: a lightweight metadata sidecar captures partition timestamps, bounding boxes, row counts, and SRIDs at each boundary and emits them via OTLP, while the geospatial assets themselves flow uninterrupted into PostGIS or object storage. This guarantees that freshness tracking introduces zero compute contention during raster tiling, topology validation, or index builds.

Alerting & SLO Design

Spatial alerting rules must respect the non-linearity of the workload: a 2% geometry-invalid rate on a 10-feature test layer is noise, but the same rate on a 50-million-feature parcel layer is a production incident. Rules therefore key on ratios and quantiles rather than raw counts. The following Prometheus rules express the four highest-value spatial SLOs:

groups:
  - name: spatial-freshness-quality-slo
    rules:
      # Freshness lag beyond 2x the per-dataset SLA window
      - alert: SpatialFreshnessSLABreach
        expr: |
          max by (dataset_id, jurisdiction) (spatial_freshness_lag_seconds)
            > on (dataset_id) group_left()
              (spatial_freshness_sla_window_seconds * 2)
        for: 10m
        labels: { severity: critical }
        annotations:
          summary: "Freshness lag >2x SLA for {{ $labels.dataset_id }} ({{ $labels.jurisdiction }})"

      # Geometry validation failure rate, normalized to ingest volume
      - alert: GeometryValidationFailureSpike
        expr: |
          sum by (layer) (rate(gis_spatial_geometry_invalid_total[15m]))
            / sum by (layer) (rate(gis_etl_features_ingested_total[15m])) > 0.02
        for: 15m
        labels: { severity: warning }

      # Coverage gap: missing tiles / shrunken extent
      - alert: SpatialCoverageGap
        expr: gis_spatial_coverage_ratio < 0.95
        for: 30m
        labels: { severity: warning }

      # Any CRS contract violation is immediately actionable
      - alert: CrsDriftDetected
        expr: increase(gis_spatial_crs_mismatch_total[5m]) > 0
        labels: { severity: critical }

For time-series feeds, alert on the tail of the gap distribution rather than the mean, since a single long gap is what breaks change detection:

histogram_quantile(0.99,
  sum by (le, region_id) (rate(gis_spatial_temporal_gap_seconds_bucket[1h]))
) > 604800   # p99 inter-observation gap exceeds 7 days

SLO targets should be set per quality dimension and per criticality tier: a regulatory compliance layer might carry a spatial_freshness_score >= 0.98 objective with a tight error budget, while an exploratory telemetry feed tolerates 0.85. Wire breached objectives to PagerDuty or Slack via the collector’s webhook exporter so that a stale satellite scene or interrupted IoT feed triggers a backfill before downstream compute is consumed.

Operational Debugging Workflow

When a freshness or quality alert fires, work the signal back toward its source in a fixed order. Skipping straight to the database wastes time when the real fault is a dropped trace or a misrouted partition.

Identify the lag source. Split spatial_freshness_lag_seconds into event-time lag versus ingestion-time lag. High event-time lag with healthy ingestion means the source is late (satellite revisit, sensor outage); high ingestion lag means the pipeline is the bottleneck.
Validate trace propagation. Confirm the transformation job’s spans actually reached Tempo with their gis.geometry.valid and gis.crs.drift attributes intact. A “no data” panel often means broken context propagation, not a healthy pipeline — check the tail-sampling policy didn’t drop the trace you need.
Audit trust boundaries. Walk source → staging → curated and compare row counts and gis.spatial.coverage_ratio at each. The boundary where the numbers diverge is the fault domain. Cross-check gis.etl.attribute_drift_total to rule out a silent schema change.
Isolate the quality dimension. Determine whether the incident is freshness, geometry, CRS, coverage, attribute, or temporal — they have different remediations and must not be conflated. A CRS mismatch frequently masquerades as freshness lag because failed reprojections stall the queue.
Mitigate backpressure. If a downstream consumer is blocked, engage the relevant fallback tier (below) to keep serving degraded-but-correct data while the upstream fault is repaired, then trigger a targeted backfill rather than a full reprocess.

Failure Modes & Fallback Chains

Resilient spatial serving degrades in tiers rather than failing outright. Each tier trades precision for availability and emits a distinct signal so operators always know which mode is active. The fallback chains that implement service-degradation routing are specified in Fallback Chains for Spatial API Failures; the table below maps the freshness-and-quality triggers onto those tiers.

Tier	Trigger	Degradation behavior	Observability signal
Primary	All checks pass	Serve full-resolution curated geometry	`spatial_freshness_score >= 0.98`
Bounding-box fallback	Coverage gap or partial tile set	Serve MBR / extent approximation; flag incompleteness	`gis.spatial.coverage_ratio < 0.95`
Simplified-geometry cache	Validation backlog or vertex-cost spike	Serve last-known-good simplified geometry from cache	`gis.spatial.geometry_invalid_total` rate rising
Circuit breaker	Freshness >2x SLA or CRS drift	Pause downstream jobs; return cached + `stale` header	`SpatialFreshnessSLABreach` / `CrsDriftDetected` firing

Three failure patterns recur often enough to call out explicitly. CRS mismatch masking freshness lag: a reprojection failure stalls ingestion, so the lag alert fires while the true defect is projection drift — always check gis.spatial.crs_mismatch_total before assuming a slow source. Topology self-intersections bypassing validation: a feature can pass a vertex-count check yet fail ST_IsValid, so never substitute cardinality checks for geometric validation. Coverage compliance hiding silent data loss: a feed can hit its row-count SLA while quietly dropping an attribute column, which is why gis.etl.attribute_drift_total runs concurrently with freshness rather than after it.

Tracking Spatial Data Freshness SLAs — composite freshness scoring, SLA window contracts, and breach playbooks.
Geometry Validity & Topology Checks — the validate-repair-quarantine gate and rule-based topology enforcement at ingest.
Coordinate Reference System Validation — SRID contracts, datum-shift safety, and CI/CD projection-drift gates.
Spatial Coverage & Extent Monitoring — MBR drift, tile-completeness ratios, and spatial-index health.
Automated Row Count & Attribute Sync — hash-based reconciliation and schema-contract enforcement across layers.
Temporal Baseline Alignment for Time-Series GIS — gap detection and cadence normalization for multi-frequency feeds.
Geospatial Observability Architecture & Fundamentals — the platform-level instrumentation, trust boundaries, and collector topology this measurement plane builds on.