What I learned writing 80+ user stories per sprint on a US state government program

For ~1.5 years I was the sole Business Analyst on a US state CCWIS program at CITI-US, and a contributing BA on CCMS and HHS programs at the same firm. Two-month sprint cycles. 80 plus user stories per sprint. State government stakeholders in one time zone, engineers in another, QA in a third, and a defect tracker that was the single source of truth.

The output of that work was 99 percent defect resolution before sign-off and 35 percent less rework after the discipline took hold. The input was a specific way of writing requirements that I have not seen replicated anywhere else, and that I now use directly when shipping AI implementations for mid-market clients.

This post is the short version of what survived contact with production AI work.

A user story is a contract, not a description

The first thing state government program work teaches you is that a user story is not a sentence. It is a contract.

As a [role], I want to [action], so that [outcome].

The reason this template exists is not to make stories sound consistent. It is to force three things into the open before a single line of code is written:

Who is the actor. Is it a caseworker, a parent, an admin, an automated system?
What is the action. Is it a one-time submission, a recurring review, an exception?
What is the outcome. What does success look like, in language the actor would use?

When any of those three is missing or fuzzy, the story is not ready. Not later, not in clarification, not in a follow-up Slack thread. Now. The cost of writing a story without all three is rework after sprint three when QA discovers the actor was actually two different roles.

This habit transfers cleanly to AI work. When a mid-market COO says, "I want our team to spend less time on data entry," that is not a user story. It is a wish. The story might be:

As a customer service rep, I want incoming inquiry emails to be automatically tagged with category, urgency, and customer history, so that I can triage in 30 seconds instead of three minutes.

That is testable. That is buildable. That has acceptance criteria.

Acceptance criteria do the work the story cannot

A user story tells you the shape of the feature. Acceptance criteria tell you when it is done.

The format that worked on CCWIS was:

Given [precondition], when [action], then [observable outcome].

Three or four of these per story. No more. If you need ten acceptance criteria, the story is too big and needs to be split.

The discipline is in the word "observable." Every criterion has to be something a tester can demonstrate without asking a developer how the code works. Internal state changes do not count. Database flag flips do not count. The criterion is what the user, or an integrating system, can see.

For AI work this is even more important than for traditional software. AI features fail in subtle, probabilistic ways. If your acceptance criteria are vague ("the AI should give good answers"), you have no defense against drift. If they are observable ("Given a CSV of 5000 sales rows, when a user asks 'show revenue by region for Q3', then the response includes a bar chart with regions on the x-axis and revenue on the y-axis, filtered to July through September of the current fiscal year"), you can test the AI like any other system.

AS-IS before TO-BE, every time

Every CCWIS feature started with two diagrams. One was the AS-IS process, mapped from interviewing the actual caseworker who did the job today. The other was the TO-BE process, the redesigned flow once the new system was in place.

You cannot skip the AS-IS step. I have watched teams skip it and pay for it later, every time.

The reason is that operational reality is always more complicated than anyone admits in a meeting. The caseworker who has been doing the job for ten years has nine workarounds for things the official process does not handle. If you design the new system based on the official process, you are designing for a workflow that does not exist.

When I do an AI Operations Audit for a mid-market client, the first deliverable is the AS-IS process for one operational area. Not a slide. A diagram and a written narrative, accurate enough that the person doing the work today can read it and say "yes, that is what I do." Only then do we talk about which steps an AI implementation should change.

The audit fee is partly for the audit itself. It is also partly for the discipline of forcing the AS-IS to exist, because mid-market companies almost never have one written down.

The "why" line that changes everything

State government program work is requirements-heavy. You are writing for stakeholders who will sign in blood that the system does what the document says it does. You are also writing for engineers who, six months later, will need to know why a feature exists when they are tempted to refactor it away.

The single most useful sentence I learned to add to every user story was a "rationale" or "why" line.

Why: This rule exists because [statute reference / policy decision / observed user behavior]. If this rule is changed, [downstream consequence].

That sentence does three things:

It documents the reason in the same place as the requirement, so future-you does not have to dig through Confluence to find it.
It exposes weak rationale. If you cannot write a sensible "why" line, the requirement may not actually be needed.
It surfaces dependencies. If the why is "because Section 5103.1.b mandates ten-day notice," then a future developer cannot quietly change the notice window without realizing they are creating a compliance issue.

For AI implementations, the why line is even more important because the AI behavior often depends on prompt engineering that looks arbitrary in code. If a prompt instructs Claude to "respond in 130 words or less," and the why line says "because the response is rendered inside a chat bubble with a max height of 240px on mobile," then a developer who later changes the limit knows what UI testing they need to do.

Defect triage is a writing exercise

Toward the end of every sprint we triaged defects. The job was deceptively simple: read the defect, decide if it is a real bug, a misunderstood requirement, or out of scope, and assign next steps.

Here is what I learned in two and a half years of doing this:

A defect that is hard to triage is almost always a defect in the original requirement, not a defect in the code.

If the tester and the developer cannot agree on whether the system is behaving correctly, the requirement was ambiguous. The triage conversation is forensic evidence of where the requirements failed.

I keep a defect log on every AI engagement now. Not because I expect a lot of bugs in a four-week sprint, but because the bugs I do see tell me exactly where the requirements were too thin. The next engagement gets sharper requirements in those areas.

What this means for AI implementations in 2026

The pattern in mid-market AI implementations right now is the opposite of what state government program work taught me. Vendors arrive with a tool and try to retrofit a process around it. Strategy decks describe transformative outcomes without specifying acceptance criteria. Buyers sign engagements with no AS-IS process documented.

When the result is "the AI is not working," nobody can prove the assertion either way, because there is no contract to test against.

The discipline that produces 99 percent defect resolution on a state government program is the same discipline that produces a working AI implementation in a mid-market firm. Roles, actions, outcomes. Observable acceptance criteria. AS-IS before TO-BE. The why line. Defect triage as a writing exercise.

That is the senior BA work. The AI implementation is what you do once the requirements are right.

If you are a mid-market operations leader looking at AI vendors and feeling like the proposals are vague, that is not your imagination. The proposals are vague. You can either work with someone who fills in the BA layer, or you can hire one yourself. The AI Operations Audit is the productized version of that BA layer for two weeks and a fixed price.

What I learned writing 80+ user stories per sprint on a US state government program

A user story is a contract, not a description

Acceptance criteria do the work the story cannot

AS-IS before TO-BE, every time

The "why" line that changes everything

Defect triage is a writing exercise

What this means for AI implementations in 2026

Ready to implement this for your business?

Sultan Siddiqui

Continue Reading

The audit-sprint-retainer model: how to structure AI consulting engagements

SEO in 2026: How AI Is Changing Search and What You Need to Do

Data Privacy and AI: What Every Business Owner Needs to Know in 2026

Have an operational problem you think AI could solve?