How I Trained 100+ Engineers to Code with AI Agents (Cursor + Claude)
The exact playbooks, workflows, and cultural shifts I used at Octdaily to get 100+ engineers shipping faster with Cursor IDE, Claude, and GitHub Copilot — without sacrificing code quality.
Why Agentic AI, Not Just Autocomplete
Most engineers think "AI coding" means autocomplete on steroids. That's table stakes. What actually moves the needle is agentic AI — where the AI understands your codebase, plans multi-file changes, runs commands, reads test output, and iterates until it's done.
At Octdaily, I set out to train 100+ engineers not just to use AI tools, but to build entire daily workflows around them. This is the complete playbook — the actual process, the actual configuration, and the measured outcomes.
The distinction matters: autocomplete saves keystrokes. Agentic AI changes how you think about programming. When the AI can plan a multi-file implementation, generate tests, run them, observe failures, diagnose the cause, and fix the code — all without you writing a single line — the nature of your work shifts from typing code to specifying intent and reviewing output.
This shift is uncomfortable for engineers trained to take pride in writing every line. Overcoming that discomfort, and redirecting that pride toward system-level thinking and rigorous review, is as much a cultural challenge as a technical one.
The Three-Layer AI Stack We Use
Layer 1 — Cursor IDE (Primary Workspace)
Cursor is our standard IDE. Every engineer uses it because:
- The codebase is indexed and semantically searchable by the AI
- Agent mode can make changes across dozens of files, run builds, and fix its own errors
- Project-level rules (.cursorrules) enforce our coding standards automatically
The indexing is underrated. When an engineer asks Cursor "where does FHIR patient data get normalised in this codebase?", the AI answers from the actual code — not from generic knowledge. This eliminates a significant category of ramp-up time for new engineers and investigation time for experienced ones.
Agent mode is the real differentiator. Unlike autocomplete or chat interfaces, Cursor Agent Mode can execute a sequence of actions: read files, understand structure, create new files, edit existing ones, run build commands, observe failures, and iterate. For well-scoped tasks — "implement the FHIR R4 Observation resource mapping for vital signs, following our existing resource patterns and including unit tests" — it completes the task end-to-end.
Layer 2 — Claude (Anthropic) for Complex Reasoning
Claude handles tasks that require multi-document reasoning:
- Breaking a Jira epic into technical sub-tasks with code stubs
- Reviewing architecture decisions against HIPAA and FHIR compliance
- Generating comprehensive test plans from acceptance criteria
- Evaluating trade-offs between technical approaches with domain context
The choice of Claude over GPT-4o for this layer is deliberate. Claude's extended context window (200K tokens) allows us to paste entire technical specification documents, multiple source files, and detailed instructions in a single prompt. Claude's tendency to express uncertainty rather than fabricate is also valuable for compliance-sensitive work — we want the AI to flag when it is unsure about HIPAA implications, not confidently provide incorrect guidance.
We use Claude claude-sonnet-4-5 with extended thinking enabled for architecture and compliance review tasks. The thinking mode produces significantly more careful analysis for complex trade-off questions.
Layer 3 — GitHub Copilot for Line-by-Line Assistance
Still valuable for fast completions within a file — especially for boilerplate-heavy areas like FHIR resource mapping and Angular form generation. Copilot's IDE integration for line-level suggestions complements Cursor Agent Mode for longer tasks. We treat it as a secondary suggestion layer that does not require explicit prompting — it is always present, and engineers accept or reject suggestions fluidly.
Why All Three?
Different AI tools have different latency, context window, and capability profiles. Using all three lets engineers select the right tool for the task:
- Need a five-line completion? Copilot.
- Need to implement a feature across ten files? Cursor Agent.
- Need to analyse a complex architecture decision against 50 pages of FHIR specification? Claude.
Requiring engineers to use only one tool because it is administratively simpler sacrifices significant productivity.
The Daily Workflow Playbook
Morning: AI-Assisted Sprint Planning
1. Paste Jira story into Claude:
"Break this user story into .NET 8 API + Angular 17 tasks.
Output: file paths to create, interfaces to define, test cases."
2. Claude outputs a structured task list with:
- Files to create/modify
- Interface contracts
- Edge cases to handle
- FHIR resources involved
3. Engineer reviews and turns this into a Cursor sessionThis step saves 30-45 minutes per story compared to engineers individually working through implementation planning. More importantly, it surfaces questions about ambiguous acceptance criteria before development starts — when they are cheap to resolve — rather than mid-implementation.
The Claude prompt is templated with project context: technology stack, FHIR version, relevant compliance requirements, and a reference to similar completed stories for stylistic consistency. Building this template library is one of the most impactful investments a team can make.
During Development: Cursor Agent Mode
Our .cursorrules file includes FHIR-specific context:
# .cursorrules
- All patient data models must implement IFhirResource
- Use Azure.Health.DataServices SDK for FHIR operations
- Every API endpoint must have a corresponding FHIR capability statement entry
- PHI fields must be annotated with [SensitiveData] attribute
- Tests use xUnit + NSubstitute, follow AAA pattern
- Angular components use OnPush change detection by default
- No direct Cosmos DB access from API controllers — use repositories
- HIPAA audit logging required for all PHI read operationsThe agent follows these rules automatically across every generated file. This is the equivalent of having your most experienced engineer's coding standards baked into every line of AI-generated code — without requiring that engineer to be in every code review.
We iterate on the .cursorrules file as we discover rules that the agent violates. Each violation becomes a new rule. After six months of iteration, the agent's output on greenfield files requires minimal correction.
Mid-Task: The Review Discipline
The most important engineering discipline we teach is the review discipline. AI-generated code is fast, but it is not infallible. Engineers who treat AI output as finished code introduce bugs as often as they save time.
The review discipline:
- Read every line of AI-generated code before accepting it
- Run the tests before declaring a task complete — even if the AI says they pass
- Ask the AI to explain any line you do not understand
- Be particularly vigilant about error handling, null checks, and edge cases — these are where AI generation is weakest
The counter-intuitive finding: junior engineers tend to be better at the review discipline than senior engineers. Junior engineers approach AI output with appropriate skepticism because they are already accustomed to verifying their understanding. Senior engineers sometimes skip verification because the code looks right at a glance — which is exactly when the subtle bugs hide.
PR Review: AI Pre-Review Before Human Review
Before any PR is assigned to a human reviewer, our CI pipeline runs:
# .github/workflows/ai-review.yml
- name: Claude PR Review
run: |
gh pr diff $PR_NUMBER | \
claude review \
--check-hipaa-compliance \
--check-fhir-standards \
--suggest-tests \
--output pr-review.md
gh pr comment --body-file pr-review.md
env:
PR_NUMBER: ${{ github.event.pull_request.number }}This catches 70% of issues before a human even opens the diff. The remaining 30% — architectural concerns, domain logic correctness, business requirement alignment — are the issues that genuinely require human expertise to identify.
The impact on code review time is significant. Reviewers spend less time on mechanical issues (missed null checks, inconsistent error handling, style violations) and more time on the concerns that actually require their expertise.
Training Program Structure
The biggest mistake organisations make with AI coding tools is not training engineers to use them effectively. Cursor and Claude do not produce good results out of the box for complex, domain-specific codebases. They require engineers to:
- Write effective prompts that include necessary context
- Know when to break a task into smaller agent sessions rather than attempting everything in one session
- Recognise when AI output is subtly wrong in domain-specific ways
- Build the habit of verification before acceptance
Our 3-week training program addresses these skills:
Week 1 — Foundations
- Prompt engineering for code tasks: specificity, context provision, output format specification
- How LLMs "see" a codebase: context windows, embedding search, what the AI can and cannot access
- Cursor agent mode capabilities and limitations: what tasks it handles well vs. where it struggles
- Live demo: implementing a complete feature from Jira story to merged PR with AI assistance
Week 2 — Daily Workflow Integration
- AI-first story breakdown: the Claude template workflow
- Pair programming with AI: you drive, AI co-pilots — maintaining the decision-making lead
- When NOT to use AI: security-sensitive code, PHI-handling logic, and code paths where correctness is critical and AI verification is unreliable
- Code review with AI pre-review: interpreting AI review output, when to act on suggestions, when to override
Week 3 — Advanced Patterns
- Multi-agent orchestration: AI generates tests, AI fixes failures, AI documents — chaining agent sessions
- Custom
.cursorrulesdesign: identifying your team's most impactful custom rules - Building a personal AI prompt library: templating prompts for recurring task types
- Team knowledge sharing: establishing a shared prompt library and
.cursorrulesversion control
Follow-up: Monthly AI workflow retrospectives
We hold monthly 30-minute retrospectives specifically on AI workflow effectiveness: what worked, what did not, and what rules to add to .cursorrules. These sessions maintain the collective learning that would otherwise stay siloed in individual engineers' personal habits.
Common Anti-Patterns We Corrected
Anti-pattern 1: Accepting AI output without review. Fix: Mandatory review discipline as a team norm, with senior engineers modelling review behaviour publicly.
Anti-pattern 2: Overly broad agent sessions. Engineers would ask the agent to "implement the entire patient intake feature" and get confused, low-quality output. Fix: Train engineers to scope agent sessions to single components or well-defined file sets.
Anti-pattern 3: Treating AI-generated tests as comprehensive. AI generates tests for the happy path and obvious edge cases. It misses domain-specific edge cases that require clinical knowledge. Fix: AI tests as a starting point, with engineers responsible for adding domain-specific test cases.
Anti-pattern 4: Prompting in AI-tool-specific ways that do not transfer. Engineers learned Cursor-specific tricks without developing transferable AI collaboration skills. Fix: Train the underlying skill (prompt engineering, context provision) rather than tool-specific behaviours.
Anti-pattern 5: Not sharing effective prompts. Every engineer was independently discovering effective prompts for the same recurring task types. Fix: Shared prompt library in the team repository with attribution and context for why each prompt works.
Measured Outcomes
After six months of the full program:
- 40% reduction in average feature delivery time — measured from story start to merge
- 60% fewer back-and-forth PR comments — AI pre-review catches issues earlier
- 3x faster onboarding for new engineers — AI explains the codebase context that would otherwise require senior engineer time
- Zero HIPAA/FHIR standard violations slipping into production since the program started
- Net Promoter Score from engineers: 8.4/10 — engineers find the workflow genuinely better, not just faster
The most surprising outcome: experienced engineers benefited more than junior engineers in absolute terms, not less. AI handles more of their mechanical tasks, freeing them to spend more time on the architectural and domain challenges where their experience creates the most value. Junior engineers are faster but still need human mentorship. Senior engineers are not just faster — their work composition improved qualitatively.
Starting Your Own Program
If you are implementing this at your organisation, prioritise in this order:
-
Cursor configuration first — Get
.cursorrulesright for your codebase and team standards. This creates a consistent baseline that makes everything else easier. -
The review discipline — Establish the norm that AI output is reviewed, not accepted blindly, before tackling advanced workflows.
-
The morning planning workflow — This is the highest-leverage change for sprint velocity and has the lowest risk. Start here.
-
AI PR pre-review — Implement this when your team is consistently reviewing AI output and knows what good AI-generated code looks like.
-
Advanced multi-agent patterns — Only after the foundations are solid. Multi-agent workflows amplify both good and bad base practices.
The cultural work — overcoming resistance, building the review discipline, establishing prompt sharing norms — takes longer than the technical work. Plan for three months of deliberate change management alongside the technical implementation.
Muhammad Moid Shams is a Lead Software Engineer who built and trained the AI-augmented engineering programme at Octdaily, a healthcare quality platform serving 20,000+ skilled nursing facilities.