All articles
Cloud Architecture15 min read2024-03-20

Why We Chose Cosmos DB for Freight Booking State Management

NoSQL vs SQL for freight booking workflows — the tradeoffs we evaluated, how Cosmos DB's partition strategy solved our scale problem, and the Change Feed pattern that powers real-time notifications downstream.

Cosmos DBAzureNoSQLFreight TechDatabase DesignAzure Cosmos DB

The Problem with Relational Models for Bookings

Freight bookings are inherently document-shaped. A single booking record contains:

  • Multi-leg route details (origin → transhipment → destination), each with its own sailing schedule, transit time, and carrier
  • Per-leg rate breakdowns across 8-15 charge types (ocean freight, BAF, CAF, THC, documentation fee, carrier security surcharge, port congestion surcharge, and others that vary by carrier and trade lane)
  • Carrier-specific contract terms and booking reference numbers that differ in structure across 85+ carriers
  • Real-time status events from multiple parties — carrier confirmations, customs clearance notifications, vessel departure and arrival events, delivery confirmations
  • Customer and shipper details specific to this booking instance

Normalising this into SQL tables meant 12+ JOINs for a single booking summary query. For a read-heavy API serving a UI that renders booking detail in real-time, this was unacceptable at scale. The booking list view — showing 50 active bookings with status, route, and rate summary — would require 50 × 12 = 600 JOIN operations for every page load.

Beyond the JOIN count, the SQL schema for freight bookings was genuinely difficult to design correctly. The per-charge-type breakdown for a booking does not fit neatly into relational columns because the charge types vary by carrier. If MAERSK has 12 surcharge types and WAN-HAI has 8 different ones, a normalised charges table either has to accommodate every possible charge type as a row (sparse, hard to query) or use a wide table with many nullable columns (denormalised, constraint violations across carriers). Both approaches create operational headaches.

Evaluating the Alternatives

Before committing to Cosmos DB, we evaluated three approaches:

Option 1: PostgreSQL with JSONB columns. Store the carrier-specific and charge-specific data in JSONB columns within a relational schema, preserving relational structure for well-known fields. This is a practical compromise for mixed structured/unstructured data. The concern was query performance at scale — JSONB queries require GIN indexes that add significant write overhead and do not scale as linearly as Cosmos DB at our target load.

Option 2: MongoDB on Azure Cosmos DB. The MongoDB API on Cosmos DB would give us document storage with the familiar MongoDB query language. We had MongoDB expertise on the team. The concern was that MongoDB's partition key constraints (shard key immutability) and the additional abstraction layer of the MongoDB API over Cosmos DB's native capabilities added complexity without clear benefit.

Option 3: Azure Cosmos DB (SQL API, native). Native Cosmos DB with the SQL API. Maximum performance, tightest Azure integration (Change Feed, RBAC via Azure AD, server-side functions), and the partition key flexibility we needed. The learning curve was real — Cosmos DB's partitioning model is different from anything in the SQL world — but the performance and scalability characteristics were clearly the best fit.

We chose Option 3.

Why Cosmos DB

Azure Cosmos DB with the SQL API gave us:

  • Document storage matching our booking object model 1:1 — no impedance mismatch between the domain object and the database record
  • <10ms reads at P99 globally — Cosmos DB's SLA guarantees 10ms read and 15ms write latency at the 99th percentile, backed by the infrastructure of Azure's global backbone
  • Partition key design for co-location of related data, eliminating cross-partition queries for the access patterns we cared about
  • Change Feed to stream booking state changes downstream to notification services, analytics, and integration partners — without polling
  • Schema-free storage for carrier-specific fields — adding a new carrier with unique field requirements requires no schema migration
  • Automatic multi-region replication for global teams accessing booking data from multiple continents

A typical freight booking document in Cosmos DB:

{
  "id": "BKG-2024-PKKAR-NLRTM-089234",
  "partitionKey": "tenant_logistics-corp_PKKAR_NLRTM",
  "tenantId": "logistics-corp",
  "status": "confirmed",
  "createdAt": "2024-03-15T09:23:41Z",
  "updatedAt": "2024-03-15T11:47:22Z",
  "route": {
    "legs": [
      {
        "sequence": 1,
        "origin": {"port": "PKKAR", "name": "Port Qasim", "country": "PK"},
        "destination": {"port": "SGSIN", "name": "Singapore", "country": "SG"},
        "carrier": "HAPAG-LLOYD",
        "vessel": "MILAN EXPRESS",
        "departureDate": "2024-03-28",
        "arrivalDate": "2024-04-06",
        "transitDays": 9
      },
      {
        "sequence": 2,
        "origin": {"port": "SGSIN", "name": "Singapore", "country": "SG"},
        "destination": {"port": "NLRTM", "name": "Rotterdam", "country": "NL"},
        "carrier": "HAPAG-LLOYD",
        "vessel": "CAPE CLARENCE",
        "departureDate": "2024-04-08",
        "arrivalDate": "2024-04-29",
        "transitDays": 21
      }
    ],
    "totalTransitDays": 30,
    "transshipmentPorts": ["SGSIN"]
  },
  "charges": {
    "baseFare": {"amount": 950, "currency": "USD"},
    "surcharges": [
      {"type": "BAF", "description": "Bunker Adjustment Factor", "amount": 180, "currency": "USD"},
      {"type": "CAF", "description": "Currency Adjustment Factor", "amount": 45, "currency": "USD"},
      {"type": "THC_ORIGIN", "description": "Terminal Handling Charge - Origin", "amount": 95, "currency": "USD"},
      {"type": "THC_DEST", "description": "Terminal Handling Charge - Destination", "amount": 130, "currency": "USD"},
      {"type": "CSS", "description": "Carrier Security Surcharge", "amount": 25, "currency": "USD"},
      {"type": "DOC", "description": "Documentation Fee", "amount": 75, "currency": "USD"}
    ],
    "totalUSD": 1500
  },
  "statusEvents": [
    {"status": "pending", "timestamp": "2024-03-15T09:23:41Z", "source": "platform"},
    {"status": "submitted", "timestamp": "2024-03-15T09:24:15Z", "source": "carrier-api"},
    {"status": "confirmed", "timestamp": "2024-03-15T11:47:22Z", "source": "carrier-api", "carrierRef": "HL-2024-1892347"}
  ],
  "carrierBookingRef": "HL-2024-1892347",
  "shipper": {
    "name": "Textile Exports Ltd",
    "address": "Industrial Area, Karachi 74900",
    "eoriNumber": "PK123456789"
  },
  "container": {
    "type": "40HC",
    "commodity": "Textile Goods",
    "weight": 12500,
    "weightUnit": "KG"
  },
  "ttl": -1
}

The entire booking — route, charges, status history, carrier references — is a single document. Retrieving it is a single Cosmos DB point read by id and partitionKey. No joins, no multiple round trips.

Partition Strategy

Partition key: /partitionKey = "{tenantId}_{originPort}_{destPort}"

This ensured all queries for a tenant's lane were co-located, avoiding cross-partition scatter-gather for the access patterns that dominated our workload:

  • Booking list for a tenant's route — all bookings for logistics-corp on the PKKAR → NLRTM lane are in the same partition. A single-partition query.
  • Single booking retrieval — point read by id and partitionKey. Single partition, <10ms.
  • Booking status update — patch operation by id and partitionKey. Single partition.

The access patterns we deliberately accepted as cross-partition (requiring scatter-gather):

  • Tenant-wide booking search — all bookings for a tenant across all lanes. Cross-partition, but these queries are low-frequency (triggered by explicit search, not routine list views) and are paginated.
  • Status reporting across tenants — admin/operations use case, tolerated higher latency.

Partition Key Design Decisions

The choice of {tenantId}_{originPort}_{destPort} for partition key requires explanation. Several alternatives were considered:

tenantId only: Would group all tenant bookings in the same partition. Hot partitions for large tenants — a tenant with 10,000 active bookings creates a heavily-loaded partition that Cosmos DB cannot distribute.

id (booking ID) only: Even distribution, but every list query requires cross-partition scatter-gather. A tenant viewing their booking list would hit every partition in the container.

{tenantId}_{originPort}_{destPort}: Distributes load across trade lanes while keeping per-lane queries co-located. The hot partition risk is manageable because even the busiest trade lane for the largest tenant generated well under Cosmos DB's per-partition throughput limit.

The partition key design is the most consequential architectural decision in a Cosmos DB implementation. Getting it wrong means either hot partitions (performance degradation) or cross-partition queries on high-frequency access patterns (latency degradation). The decision must be driven by analysing the actual access patterns — not by guessing.

TTL for Rate Expiry

Carrier rates expire every 30-90 days. We used Cosmos DB TTL to auto-purge stale rate documents without any application-level cleanup job:

{
  "id": "rate_CMACGM_PKKAR_NLRTM_20240301",
  "partitionKey": "tenant_spot_PKKAR_NLRTM",
  "carrier": "CMA-CGM",
  "originPort": "PKKAR",
  "destPort": "NLRTM",
  "totalRateUSD": 1450,
  "validFrom": "2024-03-01",
  "validTo": "2024-05-31",
  "_ts": 1709251200,
  "ttl": 7776000
}

The ttl field (in seconds from _ts, the document's last modification timestamp) tells Cosmos DB to automatically delete this document 90 days after it was last updated. Rate documents are updated when new rate schedules arrive — so the TTL effectively means "delete this rate 90 days after the last rate update for this lane from this carrier."

The alternative — a scheduled cleanup job — adds operational complexity (the job needs its own monitoring, error handling, and retry logic) and creates a window where stale rates remain queryable between job executions. TTL eliminates both problems at the cost of a single configuration field per document.

Change Feed for Downstream Integration

The Cosmos DB Change Feed is a persistent, ordered log of every document change in the container. We used it to stream booking state changes to downstream systems without polling:

// Azure Function triggered by Cosmos DB Change Feed
[FunctionName("BookingStatusChangeFeed")]
public async Task Run(
    [CosmosDBTrigger(
        databaseName: "bookings-db",
        containerName: "bookings",
        Connection = "CosmosDbConnection",
        LeaseContainerName = "leases",
        CreateLeaseContainerIfNotExists = true
    )] IReadOnlyList<BookingDocument> changedDocuments,
    ILogger log)
{
    foreach (var doc in changedDocuments)
    {
        // Only process status changes
        var previousStatus = doc.StatusEvents
            .OrderBy(e => e.Timestamp).SkipLast(1).LastOrDefault()?.Status;
        var currentStatus = doc.Status;
        
        if (previousStatus != currentStatus)
        {
            log.LogInformation(
                "Booking {BookingId} status changed: {Previous} → {Current}",
                doc.Id, previousStatus, currentStatus);
            
            // Publish to Event Grid for downstream consumers
            await _eventGridClient.PublishEventAsync(new BookingStatusChangedEvent
            {
                BookingId = doc.Id,
                TenantId = doc.TenantId,
                PreviousStatus = previousStatus,
                NewStatus = currentStatus,
                OccurredAt = doc.UpdatedAt
            });
        }
    }
}

The Change Feed consumer (Azure Function here) processes every document update. The leases container tracks which changes have been processed — if the function fails, it resumes from the last processed position. This provides exactly-once delivery semantics for state change events.

Downstream consumers of the BookingStatusChanged event:

  • Email notification service — sends customer emails on status transitions (confirmed, shipped, arrived)
  • Analytics pipeline — feeds booking state data to our Synapse Analytics gold layer for reporting
  • Third-party integration — forwards status updates to customer ERP systems via webhook

The Change Feed eliminated polling entirely from our integration architecture. Previously, the email notification service polled the booking database every 30 seconds — creating both unnecessary database load and 30-second notification latency. Change Feed notifications arrive within 2-3 seconds of the booking update.

Cosmos DB Cost Optimisation

Cosmos DB pricing is based on Request Units (RU/s) — a normalised measure of database operation cost. Managing RU consumption is the primary lever for cost control.

Use provisioned throughput, not serverless. For a SaaS with predictable load patterns, provisioned throughput is cheaper than serverless above a certain request volume. We provisioned at the container level with autoscale enabled — Cosmos DB scales throughput from a defined minimum to maximum based on actual load, charging only for the throughput consumed.

Design for point reads, not queries. Point reads (by id and partitionKey) cost 1 RU. Cross-partition queries cost proportional to the number of partitions scanned. Our partition key design was driven as much by cost as by performance — minimising cross-partition queries minimised RU consumption.

Index only what you query. Cosmos DB's default indexing policy indexes every property in every document. For documents with large arrays (like statusEvents), this creates significant write overhead. We customised the indexing policy to index only the properties we actually query, reducing write RU consumption by approximately 35%.

{
  "indexingMode": "consistent",
  "includedPaths": [
    {"path": "/tenantId/?"},
    {"path": "/status/?"},
    {"path": "/createdAt/?"},
    {"path": "/route/legs/[]/origin/port/?"}
  ],
  "excludedPaths": [
    {"path": "/charges/surcharges/*"},  // Not queried, indexed by default — exclude
    {"path": "/statusEvents/*"},         // Append-only history — exclude from index
    {"path": "/*"}                       // Exclude everything else
  ]
}

Use TTL to manage storage. Rate documents with TTL auto-purge prevents storage accumulation. Without TTL, stale rate documents would accumulate indefinitely, increasing storage cost and degrading query performance.

Results

  • 40% faster booking retrieval vs. previous SQL implementation (single document read vs. 12-table JOIN query)
  • Single-digit millisecond P99 read latency — Cosmos DB's SLA delivered in practice
  • Simplified schema evolution — adding carrier-specific fields required no migration scripts, no ALTER TABLE, and no downtime
  • Change Feed powering real-time notification pipeline — 2-second notification latency vs. 30-second polling latency
  • Zero cross-partition queries for the 80% of access patterns that matter most
  • 35% reduction in write RU cost from custom indexing policy

The main trade-off we accepted: Cosmos DB's eventual consistency model (we used Session Consistency) means that in rare cases, a read immediately after a write may not reflect the write if the read hits a different replica. For freight booking state, this was acceptable — a 2-second staleness window on status display did not create business problems. For applications where read-after-write consistency is critical, Strong Consistency is available at higher latency and RU cost.


Muhammad Moid Shams is a Lead Software Engineer specialising in Azure, distributed systems, and cloud-native data architecture.