QA or the Highway 2026: Quality Engineering in the Age of AI Agents

QA or the Highway 2026 — 7 Talks. 4 Takeaways. One New Job Description.

The Setup

Spent Friday at QA or the Highway 2026 in Columbus. It is one of those single-day, single-track conferences where you can hold a conversation with a speaker between sessions instead of sprinting across hotel floors. Seven talks, one hallway, a lot of notes.

What I did not expect was how tightly the sessions converged on the same thesis from different angles. By afternoon it was less a collection of individual talks and more a single argument that happened to have seven parts.

The argument: AI is rewriting what quality means, who owns it, and what it costs to get it wrong.

Takeaway 1: The Failure Mode Has Changed

Matt Eland opened the conference with what I thought was the most important reframe of the day.

We have spent years thinking about software quality through the lens of preventing mistakes — catching the null pointer, the off-by-one, the untested edge case. The mental model is: humans make mistakes, tests catch them, CI enforces the gate. It is a defensive posture built around the assumption that human error is the primary threat.

AI agents break that model in a specific and consequential way. An agent given a loose paragraph of requirements can produce a pull request that looks production-ready in minutes. Linting passes. Tests pass. The PR description is coherent. The code structure is clean. And it quietly does something the author did not quite mean.

The failure mode is no longer buggy code. It is well-formatted code that quietly misreads intent.

This matters because our entire defensive posture — coverage, static analysis, code review — was built to catch implementation errors. It was not built to validate whether the implementation matches what the product owner actually intended. That is a different problem, and it requires different tools.

Matt's push was to write test plans that validate intent rather than implementation. That sounds abstract until you try it: instead of testing that a function returns a specific value, you test that it upholds a business invariant. Instead of testing a UI flow step by step, you test that the user can accomplish a goal. The spec becomes the thing you are defending, not the code.

Jeff Van Fleet and Scott Boyd made a closely related argument in their AI maturity session. The teams getting real, measurable value from AI are not the ones who deployed the most tools. They are the ones who built reciprocal feedback loops between human domain expertise and the model. The model generates; the human evaluates; the evaluation informs the next generation. That loop, applied consistently, is what separates teams where AI creates leverage from teams where AI creates noise.

Both talks were saying the same thing from different positions: the human's job is moving up the stack, from implementation to intent, from execution to evaluation.

Takeaway 1 — AI agents are the new junior developers

Takeaway 2: Tooling Moved Faster Than I Gave It Credit For

Andrew Knight's Playwright and AI session was a useful gut check. I thought I had a reasonable read on where AI-assisted testing was. I did not.

Context engineering and MCP servers have changed test authoring in ways I had not tracked closely enough. Agentic test generation is no longer a demo. Teams are running it against production specs. The bottleneck in test authoring is shifting from "writing the test" to "describing what matters clearly enough for the agent to test it correctly." Which loops directly back to Matt Eland's point about intent: the teams winning at agentic testing are the teams who can articulate what they are protecting.

Mohini Agarwal and Rachana Menon's session on API-to-UI confidence was the most immediately actionable talk of the day for me. Three tools I am bringing back.

oasdiff — Breaking OpenAPI Changes in CI

If you maintain an API with multiple consumers, you have probably experienced an apparently safe schema change silently breaking a downstream client. oasdiff gives you a diff of your OpenAPI spec that flags backward-incompatible changes — new required fields, removed endpoints, changed response structures — before they reach a merge. Threading this into CI turns a reactive problem (client breaks in staging) into a proactive gate (the PR cannot land without a review of the breaking change).

Schemathesis — Edge Cases From the Spec

Instead of writing property-based tests by hand, you point Schemathesis at an OpenAPI spec and it generates a test suite that exercises the contract you have actually published. The edge cases it surfaces are the ones your spec allows but your implementation does not handle — precisely the cases that surface in production. If your spec says a field can be any string, Schemathesis will send you null, an empty string, a 10,000 character string, and Unicode characters your input handling was not expecting.

AI-Assisted Log Triage

Pipeline failures explained in plain English. You pipe the failure output through a model, give it context about what the failing step does, and get back a summary that a non-specialist can act on. The time saved on "what does this error even mean" compounds fast in a large team with rotating on-call.

Their broader reminder about data residency is worth flagging separately.

AI in CI is also a data residency conversation

The moment you route pipeline logs or API payloads through a model, you have to know where that data goes and whether it is allowed to go there. "AI in CI" is not only a quality conversation. It is a data residency conversation. Keep model calls inside enterprise-bounded, sandboxed platforms. Known boundaries. Known data path. No surprises at a security review six months later.

Takeaway 3: Safety Nets Matter More, Not Less

Chris Harbert's ephemeral environments session made an argument I want to bring to the next sprint planning.

AI-generated code couples quickly to other in-flight work. If three teams are each using AI assistance to move fast, their code does not just change faster than before — it changes in ways that are harder to predict from reading a diff, because the code that got generated was optimized for the requirements visible at generation time, not for what other branches were doing in parallel.

In that environment, the merge button is more dangerous than it used to be. Not because the code is lower quality, but because the integration surface is larger and the implicit assumptions are less visible.

Ephemeral environments — full-stack, short-lived, branch-isolated environments with real test data — are one of the few tools that give you ground truth on integration before it becomes someone's Sunday incident. Each feature branch gets its own environment. The tests run there. The integration tests run there. The environment disappears when the branch merges.

The operational cost has dropped significantly. Standing up a fresh stack per branch used to be prohibitive. Container orchestration and infrastructure-as-code have made it manageable for teams that have invested in deployability. That investment is now worth prioritizing on its own terms, not just as an operational nicety.

Takeaway 3 — Safety nets matter more, not less

The principle Chris was pushing: validate before merge, not after. Make the integration problem visible when you can still do something about it.

Takeaway 4: The Builder

The two talks I am still thinking about most are Mudassar Syed's on agentic testing and Tatyana Arbouzova's closing keynote. They came from different angles and landed in the same place.

Mudassar's framing was about the trust chain in an AI-native development workflow. If the code was written by an agent, tested by an agent, and reviewed with AI assistance, who is accountable for trust in what ships? His answer was uncomfortable in the way useful answers sometimes are: it is the person who owns the validation design. Not the person who wrote the most code. Not the person who approved the PR. The person who designed the layers of validation that give you confidence before the deploy.

Tatyana's closing keynote took the wider view. The historic split between developer, tester, designer, and analyst has been a feature of our industry for decades because the knowledge required to do each job well was genuinely different and genuinely deep. AI is collapsing that split. Not by making any one role unnecessary, but by making the tools of each role accessible to someone who understands the goal well enough to direct them.

She called the emerging role the Builder. The Builder owns the outcome end to end. They use AI as leverage across every stage — generation, testing, design, analysis — and they are accountable for trust in what ships. The competency that differentiates the Builder is not the ability to write code or write tests. It is the ability to reason about trust: what does it mean for this thing to be correct, and how do I know?

That framing is a step up, not a step out. It is not "AI is doing the job now so your job is disappearing." It is "the job is becoming more about owning the outcome and less about performing the individual tasks." That is a harder job. But it is a more important one.

What I'm Doing This Week

These are the four things I wrote down at the end of the day and am committing to before the conference clarity wears off.

Action Items

Sketch an oasdiff CI gate paired with a Schemathesis pilot against one of our OpenAPI specs.
Find the edge cases we are silently shipping.
Push for an ephemeral environment story on the next AI-heavy change.
Validate before merge, not after.
Draft a short ADR for how we review AI-assisted pull requests — layered, intent-first, with named failure modes.
What are we checking for, in what order, and what does an approver need to confirm?
Map our team against the Van Fleet / Boyd AI maturity model and pick the one next step that actually moves us forward.
Thirty minutes. One real move.

Sessions Attended

QA or the Highway 2026 · Columbus, OH

Ensuring Software Quality in the world of AI Developers — Matt Eland
Awesome Web Testing with Playwright and AI — Andrew Knight
If AI is Writing the Code, Who's Guarding the Quality? — Jeff Van Fleet, Scott Boyd
Agentic Testing & The Future of Trust — Mudassar Syed
Deploying and Testing Ephemeral Environments — Chris Harbert
From API Contracts to UI Confidence: AI-Driven Quality in CI/CD — Mohini Agarwal, Rachana Menon
The Day Testing Died — And Quality Evolved — Tatyana Arbouzova

Thanks to the organizers, the speakers, and everyone I traded notes with in the hallway. Already thinking about what to submit for next year.

QA or the Highway 2026:Quality Engineering in the Age of AI Agents