Who is Muhammad Moid Shams?

Muhammad Moid Shams (also known as Moid Shams) is a Lead Software Engineer with 9+ years of experience. He currently works at Octdaily where he has built a national FHIR R4 data warehouse connecting 20,000+ US Skilled Nursing Facilities, automated CMS 5-Star Quality Rating computation, built a QAPI compliance platform, and shipped an AI Clinical Ops Agent AI using Claude and GPT-4o. He is available for senior engineering roles and freelance projects in UAE, Saudi Arabia, and the United States.

Is Muhammad Moid Shams available for work in Dubai or UAE?

Yes. Muhammad Moid Shams is actively seeking full-time senior/lead engineering roles and freelance projects in Dubai, Abu Dhabi, and across the UAE. He specialises in FHIR R4 healthcare platforms, .NET 8 microservices, Azure cloud, Angular 17, and agentic AI engineering. Contact: me_moid@hotmail.com or LinkedIn: https://linkedin.com/in/moidshams

Is Muhammad Moid Shams available for work in Riyadh or Saudi Arabia?

Yes. Muhammad Moid Shams is open to roles in Riyadh, Jeddah, and remote positions for Saudi-based companies. He has expertise in healthcare technology, FHIR integration, Azure cloud, .NET 8, and agentic AI. Contact: me_moid@hotmail.com

What is Muhammad Moid Shams' expertise in FHIR and healthcare technology?

Muhammad Moid Shams built one of the largest FHIR R4 data platforms in US post-acute care at Octdaily, connecting 20,000+ SNFs. He has deep expertise in HL7 v2/v3 to FHIR R4 transformation, SMART on FHIR, CDS Hooks, QAPI compliance automation, CMS 5-Star Quality Rating computation, MDS 3.0 assessments, and EHR integration with Epic, Athena Health, eClinicalWorks, and PointClickCare.

What AI and agentic AI experience does Muhammad Moid Shams have?

Muhammad Moid Shams built an AI Clinical Ops Agent AI using Claude Opus and GPT-4o that monitors 20,000+ SNFs around the clock. He trained 100+ engineers in agentic AI development workflows using Cursor IDE and Claude, achieving 40% faster feature delivery. He has hands-on experience with Claude API, multi-agent orchestration, RAG (retrieval-augmented generation), Model Context Protocol (MCP), LangChain, Semantic Kernel, and Cursor IDE agent mode.

How can I hire Muhammad Moid Shams as a freelancer?

Contact Muhammad Moid Shams at me_moid@hotmail.com or +92 340 0064394. You can also reach him on LinkedIn at https://linkedin.com/in/moidshams. He is immediately available for freelance and contract projects in healthcare technology, FHIR integration, Azure cloud, .NET 8, Angular, and agentic AI engineering.

What programming languages and technologies does Muhammad Moid Shams use?

Muhammad Moid Shams works primarily with C# (.NET 8), TypeScript, Python, and SQL. His key frameworks and platforms include ASP.NET Core, Angular 17, React/Next.js, Azure cloud services, Azure Databricks, Azure Health Data Services (FHIR), and the Claude/OpenAI APIs. He also uses Cursor IDE, GitHub Copilot, and LangChain for AI-assisted development.

What is Muhammad Moid Shams' experience with Azure cloud?

Muhammad Moid Shams has extensive Azure cloud experience including Azure Kubernetes Service (AKS), Azure Health Data Services (FHIR R4 server), Azure Event Hubs, Azure Functions, Azure Databricks, Azure Synapse Analytics, Azure Data Factory, Azure API Management, Azure Service Bus, Cosmos DB, Azure OpenAI Service, Azure AI Search, and Azure Key Vault.

What is Muhammad Moid Shams' contact information?

Email: me_moid@hotmail.com | Phone: +92 340 0064394 | LinkedIn: https://linkedin.com/in/moidshams | Portfolio: https://moidshams.dev

All articles

Performance14 min read2023-07-22

Distributed Redis Caching Strategy for High-Frequency API Calls

Our approach to caching real-time freight rate data with Redis — balancing freshness vs performance at 2000+ RPS. The layered L1/L2 caching architecture, cache invalidation via pub/sub, and the operational lessons from running it in production.

Redis.NETCachingPerformanceDistributed Systems

The Problem

Real-time freight rates change, but not that fast. Carrier APIs were being hammered with the same origin+destination+container combination thousands of times per minute — mostly cache-miss scenarios serving identical data.

The symptom was clear: P95 response time of 2,800ms on rate queries, carrier API rate limits being hit during peak hours, and an infrastructure bill driven primarily by carrier API call volume rather than actual compute. The root cause was equally clear: we had no caching layer, so every user request hit the carrier API directly.

The challenge was not simply adding Redis. The nuance was designing a caching strategy that:

Kept rate data fresh enough for commercial accuracy (stale rates that have moved can cost the business real money on quoted contracts)
Eliminated the carrier API call volume that was causing rate limits and latency
Handled cache invalidation correctly when carriers pushed rate updates
Maintained correctness under horizontal scaling (multiple API server instances reading and writing cache)
Did not introduce new failure modes — a caching layer that fails under load should degrade gracefully, not bring down the rate service

Why Layered Caching

A single Redis cache would have solved most of the problem. We added an L1 in-process memory cache because of one specific observation: the same rate query was being made multiple times within the same second from different user sessions viewing the same route. Redis has low latency (~1ms within the same datacenter), but for queries happening at 2000+ RPS, even 1ms added up — and Redis itself would have become a bottleneck at sustained high throughput.

The layered architecture:

Request → In-Memory L1 (10s TTL)
             ↓ miss
         Redis L2 (5min TTL)  
             ↓ miss
         Carrier API + write-through to Redis + L1

L1 (in-memory) provides sub-millisecond response for high-frequency identical queries within a 10-second window. L2 (Redis) provides cross-instance data sharing with a 5-minute freshness guarantee. The carrier API is only called when both cache layers miss — when genuinely fresh data is needed.

The TTLs were determined empirically: freight rates updated more frequently than every 5 minutes in less than 3% of queries based on a 2-week analysis of carrier rate change events. A 5-minute L2 TTL meant accepting stale data in 3% of queries in exchange for the carrier API call reduction — a trade-off that worked for our commercial model (we disclosed that quoted rates were valid for 10 minutes and refreshed on booking).

Implementation

public class CachedRateService : IRateService
{
    private readonly IDistributedCache _redis;
    private readonly IMemoryCache _local;
    private readonly IRateService _inner;
    private readonly ILogger<CachedRateService> _logger;
 
    public async Task<List<CarrierRate>> GetRatesAsync(RouteKey key)
    {
        var cacheKey = $"rates:{key}";
        
        // L1: in-process memory
        if (_local.TryGetValue(cacheKey, out List<CarrierRate>? cached))
        {
            _logger.LogDebug("L1 cache hit for {RouteKey}", key);
            return cached!;
        }
 
        // L2: distributed Redis
        var bytes = await _redis.GetAsync(cacheKey);
        if (bytes is not null)
        {
            var rates = MessagePackSerializer.Deserialize<List<CarrierRate>>(bytes);
            // Populate L1 from Redis hit — warm the local cache
            _local.Set(cacheKey, rates, TimeSpan.FromSeconds(10));
            _logger.LogDebug("L2 cache hit for {RouteKey}", key);
            return rates;
        }
 
        // Origin: fetch & populate both layers
        _logger.LogInformation("Cache miss for {RouteKey} — fetching from carrier API", key);
        var fresh = await _inner.GetRatesAsync(key);
        var packed = MessagePackSerializer.Serialize(fresh);
        
        await _redis.SetAsync(cacheKey, packed, new DistributedCacheEntryOptions
        {
            AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(5)
        });
        _local.Set(cacheKey, fresh, TimeSpan.FromSeconds(10));
 
        return fresh;
    }
}

Serialisation Choice: MessagePack Over JSON

The serialisation choice mattered at 2000+ RPS. We benchmarked three options:

| Format | Serialise (µs) | Deserialise (µs) | Size (bytes) | |--------|----------------|------------------|--------------| | System.Text.Json | 18.2 | 22.4 | 1,284 | | Newtonsoft.Json | 31.7 | 38.1 | 1,284 | | MessagePack | 3.1 | 4.8 | 487 |

MessagePack's binary format is 6x faster to serialise, 5x faster to deserialise, and 62% smaller. At 2000+ RPS, this saves approximately 15ms of CPU per second on serialisation alone — and the reduced payload size decreases both Redis memory usage and network transfer time.

The trade-off: MessagePack binaries are not human-readable, which makes debugging cache content harder. We addressed this by building a diagnostic endpoint that serialises a cache entry to JSON on demand, used only for debugging.

Preventing Cache Stampede

A naive implementation has a stampede problem: when L2 expires for a popular route, dozens of concurrent requests simultaneously miss the cache and all hit the carrier API at once. This is the cache stampede (or thundering herd) problem.

Our fix uses a distributed lock:

private async Task<List<CarrierRate>> GetRatesWithLockAsync(RouteKey key, string cacheKey)
{
    var lockKey = $"lock:{cacheKey}";
    var lockToken = Guid.NewGuid().ToString();
    
    // Attempt to acquire lock (expires in 10s to prevent deadlock)
    bool acquiredLock = await _redis.StringSetAsync(
        lockKey, lockToken, TimeSpan.FromSeconds(10), When.NotExists);
    
    if (acquiredLock)
    {
        try
        {
            var fresh = await _inner.GetRatesAsync(key);
            await WriteToCacheAsync(cacheKey, fresh);
            return fresh;
        }
        finally
        {
            // Release lock only if we still own it
            await ReleaseLockAsync(lockKey, lockToken);
        }
    }
    else
    {
        // Another instance is fetching — wait briefly and read from cache
        await Task.Delay(200);
        var bytes = await _redis.GetAsync(cacheKey);
        if (bytes is not null)
            return MessagePackSerializer.Deserialize<List<CarrierRate>>(bytes);
        
        // If still no cache after waiting, fetch directly (last resort)
        return await _inner.GetRatesAsync(key);
    }
}

This ensures that for any given route, only one server instance fetches from the carrier API when the cache expires. Other instances wait briefly and then read the freshly populated cache.

Cache Invalidation via Redis Pub/Sub

TTL-based expiration works for normal operation, but carriers occasionally push proactive rate updates — price adjustments, surcharge changes, lane closures. When these events arrive, we want to invalidate the affected cache entries immediately rather than waiting for the TTL to expire.

Redis pub/sub provides the cross-instance invalidation channel:

// Publisher — when a carrier rate update event arrives
await _subscriber.PublishAsync(
    "rate-invalidation",
    JsonSerializer.Serialize(new { carrier = "MAERSK", lanes = affectedLanes })
);
 
// Subscriber — on every API server instance
_subscriber.Subscribe("rate-invalidation", async (channel, message) =>
{
    var invalidation = JsonSerializer.Deserialize<RateInvalidationEvent>(message!);
    
    foreach (var lane in invalidation.Lanes)
    {
        var cacheKey = $"rates:{lane}";
        
        // Remove from Redis L2
        await _redis.RemoveAsync(cacheKey);
        
        // Remove from in-process L1
        _local.Remove(cacheKey);
        
        _logger.LogInformation(
            "Invalidated cache for {Carrier} lane {Lane}", 
            invalidation.Carrier, lane);
    }
});

The pub/sub pattern ensures all server instances invalidate their L1 cache simultaneously when a carrier pushes an update — not just the instance that received the carrier event. Without this, you would have inconsistent stale data across instances until their L1 TTLs expired.

The Invalidation Edge Case: What If the Invalidation Event Is Lost?

Redis pub/sub is fire-and-forget — messages sent when no subscriber is listening are lost. For our use case, a lost invalidation message means some instances serve stale rates until the 5-minute TTL expires. This was acceptable given our rate staleness SLA.

For use cases where invalidation correctness is critical, the pattern shifts to Redis Streams (persistent, durable message delivery) or a separate notification via a message bus with at-least-once delivery guarantees.

Redis Cluster Configuration

Production Redis configuration for this workload:

# redis.conf — production settings
maxmemory 6gb
maxmemory-policy allkeys-lru  # Evict LRU keys when memory is full
activerehashing yes
 
# Persistence — we accept potential data loss on crash (caches regenerate)
appendonly no
save ""
 
# Cluster
cluster-enabled yes
cluster-node-timeout 5000
 
# Performance
hz 20  # Increased background task frequency for more responsive key expiry
lazyfree-lazy-eviction yes
lazyfree-lazy-expire yes

Key decisions:

allkeys-lru eviction — when Redis memory fills (which it does with bursty traffic), we evict the least recently used cache entries. This is correct for a cache — we do not need LRU-evicted data, we will regenerate it on the next miss.
No persistence — this is a cache, not a database. A Redis restart should regenerate from the carrier API, not from potentially-stale persisted data.
3-node cluster — primary/replica pairs for two nodes, with a third primary for distribution. Provides both read scaling and failure tolerance.

Operational Monitoring

Metrics we tracked in production:

// Custom metrics via OpenTelemetry
private void RecordCacheMetrics(string cacheLevel, bool hit, string routeKey)
{
    _hitRate.Add(hit ? 1 : 0, 
        new KeyValuePair<string, object?>("level", cacheLevel),
        new KeyValuePair<string, object?>("route_prefix", routeKey[..6]));
    
    _requestCount.Add(1,
        new KeyValuePair<string, object?>("level", cacheLevel));
}

Dashboards we built:

L1 hit rate by route prefix (identifies routes that do not cache well)
L2 hit rate by route prefix
Carrier API call rate and error rate (the signal that matters most commercially)
Redis memory utilisation per node
Cache key count by TTL bucket (identifies unexpected cache accumulation)
Invalidation event processing latency (how quickly pub/sub messages reach all instances)

Alert thresholds:

Carrier API error rate > 1% for 2 minutes → page on-call (indicates carrier API degradation)
L2 hit rate < 85% → warning (suggests TTL tuning opportunity or new traffic pattern)
Redis memory > 75% per node → warning (approaching eviction pressure)

Results

API cache hit rate: 94% (split: ~65% L1, ~29% L2)
Carrier API calls reduced by ~90%
Average response time: 2,800ms → 85ms at P95
Redis cluster: 3-node, 6GB total — handling 2,000+ RPS comfortably
Infrastructure cost: 40% reduction in carrier API spend from reduced call volume
Zero production incidents attributable to the caching layer in 18 months of operation

The 85ms P95 response time breaks down approximately: 2ms L1 lookup, 8ms L2 lookup, 75ms carrier API parsing (for L2/origin hits). The 40% of requests that hit L1 respond in under 5ms end-to-end.

When Not to Use This Pattern

Layered distributed caching is appropriate when:

Data is read far more frequently than it is written
Acceptable staleness window exists (seconds to minutes)
Cache invalidation events can be reliably detected and published
Multiple service instances share the same data requirements

It is not appropriate when:

Data must be current to the millisecond (use direct reads or WebSocket subscriptions)
Write volume is comparable to read volume (cache becomes a synchronisation problem)
Data is user-specific and not shared across sessions (personalised caches are usually better served by client-side caching or session-scoped server state)
Correctness is more important than performance and the data is complex to invalidate reliably

The pattern looks simple but the operational complexity — stampede prevention, cross-instance invalidation correctness, eviction strategy, monitoring — only becomes visible in production. Getting it right requires planning these concerns up front, not discovering them as incidents.

Muhammad Moid Shams is a Lead Software Engineer specialising in .NET, Azure, and distributed systems performance engineering.