Who is Muhammad Moid Shams?

Muhammad Moid Shams (also known as Moid Shams) is a Lead Software Engineer with 9+ years of experience. He currently works at Octdaily where he has built a national FHIR R4 data warehouse connecting 20,000+ US Skilled Nursing Facilities, automated CMS 5-Star Quality Rating computation, built a QAPI compliance platform, and shipped an AI Clinical Ops Agent AI using Claude and GPT-4o. He is available for senior engineering roles and freelance projects in UAE, Saudi Arabia, and the United States.

Is Muhammad Moid Shams available for work in Dubai or UAE?

Yes. Muhammad Moid Shams is actively seeking full-time senior/lead engineering roles and freelance projects in Dubai, Abu Dhabi, and across the UAE. He specialises in FHIR R4 healthcare platforms, .NET 8 microservices, Azure cloud, Angular 17, and agentic AI engineering. Contact: me_moid@hotmail.com or LinkedIn: https://linkedin.com/in/moidshams

Is Muhammad Moid Shams available for work in Riyadh or Saudi Arabia?

Yes. Muhammad Moid Shams is open to roles in Riyadh, Jeddah, and remote positions for Saudi-based companies. He has expertise in healthcare technology, FHIR integration, Azure cloud, .NET 8, and agentic AI. Contact: me_moid@hotmail.com

What is Muhammad Moid Shams' expertise in FHIR and healthcare technology?

Muhammad Moid Shams built one of the largest FHIR R4 data platforms in US post-acute care at Octdaily, connecting 20,000+ SNFs. He has deep expertise in HL7 v2/v3 to FHIR R4 transformation, SMART on FHIR, CDS Hooks, QAPI compliance automation, CMS 5-Star Quality Rating computation, MDS 3.0 assessments, and EHR integration with Epic, Athena Health, eClinicalWorks, and PointClickCare.

What AI and agentic AI experience does Muhammad Moid Shams have?

Muhammad Moid Shams built an AI Clinical Ops Agent AI using Claude Opus and GPT-4o that monitors 20,000+ SNFs around the clock. He trained 100+ engineers in agentic AI development workflows using Cursor IDE and Claude, achieving 40% faster feature delivery. He has hands-on experience with Claude API, multi-agent orchestration, RAG (retrieval-augmented generation), Model Context Protocol (MCP), LangChain, Semantic Kernel, and Cursor IDE agent mode.

How can I hire Muhammad Moid Shams as a freelancer?

Contact Muhammad Moid Shams at me_moid@hotmail.com or +92 340 0064394. You can also reach him on LinkedIn at https://linkedin.com/in/moidshams. He is immediately available for freelance and contract projects in healthcare technology, FHIR integration, Azure cloud, .NET 8, Angular, and agentic AI engineering.

What programming languages and technologies does Muhammad Moid Shams use?

Muhammad Moid Shams works primarily with C# (.NET 8), TypeScript, Python, and SQL. His key frameworks and platforms include ASP.NET Core, Angular 17, React/Next.js, Azure cloud services, Azure Databricks, Azure Health Data Services (FHIR), and the Claude/OpenAI APIs. He also uses Cursor IDE, GitHub Copilot, and LangChain for AI-assisted development.

What is Muhammad Moid Shams' experience with Azure cloud?

Muhammad Moid Shams has extensive Azure cloud experience including Azure Kubernetes Service (AKS), Azure Health Data Services (FHIR R4 server), Azure Event Hubs, Azure Functions, Azure Databricks, Azure Synapse Analytics, Azure Data Factory, Azure API Management, Azure Service Bus, Cosmos DB, Azure OpenAI Service, Azure AI Search, and Azure Key Vault.

What is Muhammad Moid Shams' contact information?

Email: me_moid@hotmail.com | Phone: +92 340 0064394 | LinkedIn: https://linkedin.com/in/moidshams | Portfolio: https://moidshams.dev

All articles

Data Engineering16 min read2024-06-12

Medallion Architecture on Azure: Databricks + Synapse in Production

How we designed a Bronze → Silver → Gold medallion lake at Gamasuite to handle real-time freight data from 85+ carriers — with zero data loss guarantees, schema evolution, and sub-500ms latency from carrier API to user quote card.

Azure DatabricksSynapse AnalyticsData LakeETLDelta LakeData Engineering

What Is Medallion Architecture?

The medallion architecture (or multi-hop architecture) organises a data lake into three layers:

Bronze — raw ingested data, exactly as received, with no transformation
Silver — cleaned, deduplicated, conformed data in a standardised schema
Gold — business-aggregated, query-ready datasets optimised for specific consumption patterns

The pattern solves a fundamental tension in analytics data systems: the need for fast access to processed, query-optimised data versus the need to reprocess raw data when business logic or quality rules change. By preserving raw data in Bronze, you can always re-derive Silver and Gold from scratch as requirements evolve — without needing to re-fetch from source systems.

At Gamasuite we processed live freight rates from 85+ carriers including MAERSK, HAPAG-LLOYD, CMA-CGM, and WAN-HAI. Latency was critical — users expected sub-second quote comparisons. Understanding exactly how the medallion layers enabled this at scale is worth exploring in detail.

Why Freight Rate Data Is Hard

Before getting to the architecture, it is worth understanding what makes freight rate data difficult enough to justify this level of infrastructure investment.

Volume: 85+ carriers, each with hundreds of lane-container type combinations, updated multiple times per day. Processing every rate update means handling tens of thousands of messages per hour during peak carrier update windows.

Heterogeneity: Each carrier provides rates in a different format — some via REST API with JSON, some via EDI (ANSI X12 or UN/EDIFACT), some via CSV file drop on SFTP, some via proprietary API with non-standard data models. Unifying these formats into a consistent schema requires substantial transformation logic.

Temporal complexity: Freight rates have complex temporal semantics — effective dates, expiry dates, booking validity windows, and surcharge schedules that stack on top of base rates. A "current" rate is not a point-in-time snapshot but a function of sailing date, booking date, and commodity type.

Correctness requirements: Users quote customers based on these rates, then buy space from carriers at those rates. A stale or incorrect rate in the Gold layer is not just a data quality problem — it is a financial liability. We needed correctness guarantees that went beyond "eventually consistent."

Our Azure Stack

Carrier APIs / EDI → Azure Functions → Event Hub
                                         ↓
                              Azure Databricks (Bronze ingestion)
                                         ↓
                              Delta Lake on ADLS Gen2 (Silver — cleansed)
                                         ↓
                              Azure Synapse Analytics (Gold — aggregated)
                                         ↓
                              Azure Cache for Redis (hot query cache)
                                         ↓
                              .NET 8 API → Angular frontend

Each component has a specific role, and the choices at each level were deliberate.

Azure Functions for ingestion: Serverless, scales automatically with carrier update event volume, and keeps the ingestion layer decoupled from the processing layer. Functions write raw events to Event Hub for buffering.

Azure Event Hub: Provides durable event buffering between ingestion and processing, handles burst ingestion from multiple carriers updating simultaneously, and gives Databricks a stable read position through consumer groups.

Azure Databricks: The processing engine for both batch and streaming transformations. Databricks' Delta Lake format is the technical foundation that makes the medallion pattern reliable.

Azure Synapse Analytics: The Gold layer SQL warehouse — optimised for analytical queries from the comparison API and BI dashboards, with dedicated SQL pools for consistent query performance.

Azure Cache for Redis: Hot cache for the most-queried Gold layer data — reducing Synapse query load for frequently-accessed routes. Covered separately in our Redis caching post.

Key Design Decisions

1. Delta Lake for ACID Transactions

Using Delta Lake on top of Azure Data Lake Storage Gen2 gave us capabilities that vanilla Parquet on blob storage cannot provide:

ACID transactions: Multiple Databricks jobs writing to the same Delta table do not corrupt each other — Delta's transaction log ensures atomicity and isolation. This matters during carrier update surges when multiple streaming jobs are writing Silver simultaneously.

Time travel: Every Delta table maintains a transaction log that enables reading the table as of any previous point in time. When we need to debug a data quality issue — "what did the Silver table look like at 14:32 yesterday?" — we can query it directly without any manual backup process.

Schema enforcement and evolution: Delta enforces schema on write, catching schema errors at ingestion time rather than at query time. When carriers change their API response schema (which happens regularly, often without notice), schema enforcement surfaces the issue immediately in the Bronze layer rather than silently corrupting Silver.

Schema evolution: When we need to add a column to a Silver table, Delta supports adding nullable columns to existing tables without rewriting historical data. Existing rows materialise the new column as null; new rows populate it. This allowed us to extend our data model as business requirements evolved without downtime.

# Bronze → Silver streaming pipeline with Delta Lake
df_silver = (
  spark.readStream
    .format("delta")
    .load("/mnt/bronze/freight-rates")
    .filter("is_valid = true")
    .dropDuplicates(["carrier_id", "rate_id", "effective_date"])
    # Conform to silver schema
    .withColumn("rate_usd", col("total_charge").cast("decimal(18,4)"))
    .withColumn("processed_at", current_timestamp())
    .withColumn("data_source", lit("carrier-api"))
)
 
df_silver.writeStream \
  .format("delta") \
  .outputMode("append") \
  .option("checkpointLocation", "/mnt/checkpoints/silver-rates") \
  .option("mergeSchema", "true")  # Auto-merge schema evolution
  .start("/mnt/silver/freight-rates")

The checkpointLocation is critical — it tracks the exact position in the Bronze Event Hub stream so that if the streaming job fails, it resumes from where it left off rather than reprocessing or skipping events. This is the mechanism behind the zero-data-loss guarantee.

2. Structured Streaming for Near-Real-Time

Databricks Structured Streaming processes carrier rate events continuously — not in hourly or daily batches. The streaming micro-batch interval was tuned to 30 seconds: frequent enough to provide sub-minute freshness for the Silver layer, infrequent enough to not generate excessive Databricks job overhead.

The Bronze → Silver streaming job validates, deduplicates, and normalises incoming events. Deduplication is non-trivial: carriers occasionally send duplicate rate updates when retrying after perceived failures, and EDI feeds sometimes double-deliver. The dropDuplicates on ["carrier_id", "rate_id", "effective_date"] eliminates these at the Silver layer.

3. Carrier-Specific Parsers as Bronze Layer

The heterogeneity of carrier formats required carrier-specific parsing at the Bronze ingestion layer. Each carrier has its own Azure Function and its own parser class:

class MaerskRateParser(BaseCarrierParser):
    """Maersk Spot API response parser."""
    
    def parse(self, raw_payload: dict) -> list[BronzeRateRecord]:
        records = []
        for quote in raw_payload.get("quotes", []):
            for schedule in quote.get("schedules", []):
                records.append(BronzeRateRecord(
                    carrier_id="MAERSK",
                    rate_id=quote["quotationId"],
                    origin_port=quote["originLocation"]["locationCode"],
                    destination_port=quote["destinationLocation"]["locationCode"],
                    container_type=quote["cargoType"],
                    effective_date=datetime.fromisoformat(schedule["departureDate"]),
                    expiry_date=datetime.fromisoformat(schedule["expiryDate"]),
                    total_charge=Decimal(str(schedule["totalPrice"]["value"])),
                    currency=schedule["totalPrice"]["currency"],
                    raw_payload=json.dumps(raw_payload),  # Preserve raw for replay
                    is_valid=True,
                    parse_error=None
                ))
        return records
    
    def validate(self, record: BronzeRateRecord) -> tuple[bool, str | None]:
        if not record.origin_port or not record.destination_port:
            return False, "Missing port codes"
        if record.total_charge <= 0:
            return False, "Invalid charge amount"
        if record.expiry_date <= record.effective_date:
            return False, "Expiry before effective date"
        return True, None

Note the raw_payload field storing the original JSON. The Bronze layer stores the raw payload alongside the parsed record — enabling replay when parsing logic changes. When we discover a parser bug that was silently corrupting data, we can fix the parser and reprocess the Bronze layer's raw payloads to regenerate clean Silver data.

4. Synapse Analytics for Gold Aggregation

Gold layer ran in Synapse for two consumption patterns: BI reporting (Power BI connected to Synapse) and the comparison API (our .NET API queried Synapse for rate comparisons).

-- Gold layer materialised view for carrier comparison
CREATE VIEW gold.carrier_rate_comparison AS
SELECT 
  origin_port,
  destination_port,
  container_type,
  carrier_name,
  MIN(total_rate_usd) AS min_rate,
  AVG(total_rate_usd) AS avg_rate,
  MAX(total_rate_usd) AS max_rate,
  COUNT(*) AS quote_count,
  MAX(updated_at) AS last_updated
FROM silver.freight_rates
WHERE effective_date >= DATEADD(day, -7, GETDATE())
  AND expiry_date >= GETDATE()  -- Only currently-valid rates
GROUP BY origin_port, destination_port, container_type, carrier_name;
 
-- Gold layer pre-aggregated table (refreshed every 5 minutes via Synapse pipeline)
CREATE TABLE gold.route_summary
WITH (DISTRIBUTION = HASH(route_key), CLUSTERED COLUMNSTORE INDEX)
AS
SELECT
  CONCAT(origin_port, '-', destination_port, '-', container_type) AS route_key,
  origin_port,
  destination_port,
  container_type,
  COUNT(DISTINCT carrier_name) AS carrier_count,
  MIN(total_rate_usd) AS best_rate,
  AVG(total_rate_usd) AS avg_market_rate,
  GETDATE() AS calculated_at
FROM silver.freight_rates
WHERE is_valid = 1
  AND effective_date <= GETDATE()
  AND expiry_date >= GETDATE()
GROUP BY origin_port, destination_port, container_type;

The route_summary table is pre-aggregated and stored as a clustered columnstore index — Synapse's most efficient storage format for analytical queries. This table is refreshed every 5 minutes by a Synapse pipeline, trading the cost of a refresh job for dramatically faster query response times compared to computing the aggregation on every API request.

5. Data Quality Framework

Data quality issues in freight rate data are not hypothetical — they are constant. Carriers submit negative charges, future-dated effective dates, impossible weight ranges, and schema variations that break naive parsers. We implemented a data quality framework that quantified quality at each layer:

# Silver layer data quality checks — run after each streaming micro-batch
def check_silver_quality(df: DataFrame, batch_id: int) -> None:
    total = df.count()
    
    quality_checks = {
        "no_negative_rates": df.filter(col("total_rate_usd") <= 0).count(),
        "valid_port_codes": df.filter(~col("origin_port").rlike("[A-Z]{5}")).count(),
        "future_effective_dates": df.filter(col("effective_date") > col("created_at")).count(),
        "expired_on_arrival": df.filter(col("expiry_date") < col("effective_date")).count()
    }
    
    for check_name, failure_count in quality_checks.items():
        failure_rate = failure_count / total if total > 0 else 0
        
        # Log to Application Insights
        logger.info(f"DQ Check {check_name}: {failure_rate:.2%} failure rate in batch {batch_id}")
        
        # Alert if failure rate exceeds threshold
        if failure_rate > 0.05:  # 5% threshold
            send_alert(f"Data quality issue in Silver: {check_name} at {failure_rate:.1%}")

Quality metrics were surfaced in a Power BI dashboard visible to both the data engineering team and the business operations team — making data quality a visible metric rather than a hidden concern.

Operational Lessons

Lesson 1: Cost-based cluster autoscaling, not time-based. We initially used time-based scaling (scale up at 07:00, scale down at 22:00). Carrier update bursts did not follow business hours — HAPAG-LLOYD would push large rate updates at 03:00 UTC. Switching to cost-based autoscaling (scale up when Event Hub lag exceeds threshold, scale down when lag is clear) cut idle cluster costs by 35% and eliminated lag-induced data freshness degradation.

Lesson 2: Checkpoint locations matter more than code. Three times in production, the streaming pipeline resumed from the wrong position after a cluster restart — causing either data gaps or reprocessed duplicates. The root cause was each time a checkpoint directory misconfiguration. Now, checkpoint directory paths are parameterised from configuration and validated at job start. Any change to checkpoint paths goes through code review.

Lesson 3: Schema evolution is a business communication problem. When a carrier changes their API schema, the impact depends on how downstream Silver and Gold schemas handle it. We learned to treat carrier API schema changes as a business event requiring coordination — not purely a technical event handled by developers. A carrier changing how they represent surcharges could break a user's quote without an obvious error anywhere in the pipeline.

Lesson 4: Delta time travel is not a backup strategy. Time travel is valuable for debugging and limited-scope recovery, but Delta's default log retention of 7 days means data older than that is not accessible via time travel. Genuine backup requires Azure Storage geo-replication and periodic full snapshots exported to cold storage. We had a near-miss when investigating a data issue that turned out to have started 10 days before discovery.

Results

85+ carriers integrated via a single unified Bronze schema
<500ms average latency from carrier response to UI quote card (including Silver processing and Gold refresh)
Zero data loss during carrier API schema changes (schema evolution via Delta)
Zero downtime during the quarterly carrier integration updates
60% reduction in manual data reconciliation effort — quality issues caught at Bronze and Silver layers instead of discovered in Gold by business users
35% infrastructure cost reduction from cluster autoscaling vs. time-based scaling

Muhammad Moid Shams is a Lead Software Engineer specialising in Azure data engineering, distributed systems, and healthcare data platforms.