Skip to main content
Skip to main content
Back to Blog
analysis
9 min read

Can AI Replace Knowledge Workers? The Verifiability Test

AI is reaching beyond coding into marketing, law, and finance. A new framework predicts which professions are next — find out where yours falls.

Can Robots Take My Job Team
HUMAN × MACHINE

Opinion and analysis. This article reflects the editorial team's interpretation of emerging trends and should not be treated as prediction. Individual career decisions should consider personal circumstances.

You've Seen This Movie Before — In Someone Else's Industry

If you're a lawyer, product manager, financial analyst, or marketer, you've been watching software engineers deal with AI for two years now. Maybe with a mix of sympathy and relief: that's their problem, not mine.

Here's what changed: the same multi-agent architectures that transformed coding are now demonstrably generalizing beyond code. And a framework from the AI engineering world predicts exactly which professions are next in line.

The question isn't whether AI reaches your field. It's whether you can pass the verifiability test when it does.

The Smooth Frontier: What Actually Happened in Coding

To understand what's coming for knowledge work, you need to understand what already happened in software engineering — not the hype version, but the mechanical reality.

AI models have what researchers call a "jagged frontier" — brilliant at some tasks, comically bad at others, with no predictable pattern. Raw AI was unreliable for serious work. Then multi-agent architectures changed the equation.

Cursor, the AI coding tool, built a harness: multiple AI agents that plan, execute, review, and correct each other's work. Instead of one AI attempt at a problem, you get a structured team of agents that iterate. The jagged frontier got smoothed — not by making smarter AI, but by making smarter deployment.

In March 2026, this system did something that caught the attention of researchers outside software: Cursor's coding harness solved an unpublished research-grade math problem — a "First Proof" benchmark created by Fields Medalists and MacArthur Fellows. CEO Michael Truell's framing: "This suggests that our technique for scaling agent coordination might generalize beyond coding."

That's the signal. Not "AI got smarter." The deployment method works in domains it wasn't designed for.

The Verifiability Framework: Which Professions Are Next?

So which domains will multi-agent AI reach next? Here's the framework that predicts it — and it comes down to one question: can an experienced practitioner evaluate the output?

Tier 1: Machine-Checkable Output

Who: Software engineers, data analysts, accountants, QA engineers

What makes it Tier 1: Output quality is verifiable by objective criteria. Code compiles or it doesn't. Numbers add up or they don't. Tests pass or fail.

AI status: Already in the smooth frontier zone. This is where Copilot, Cursor, and coding agents already operate. The transformation is underway.

If this is you: You already know. The restructuring is happening. Your value has shifted from producing output to directing and evaluating AI output at scale.

Tier 2: Expert-Checkable Output

Who: Product managers, marketing strategists, lawyers, financial analysts, UX designers, customer success leads, clinical trial designers

What makes it Tier 2: Output quality can be evaluated by experienced practitioners with reasonable consensus. Three senior lawyers read an AI-drafted brief and largely agree on whether it's good. Four experienced PMs assess a product strategy and converge on what's missing.

AI status: Entering the smooth frontier zone in 2026-2027 as multi-agent systems generalize from code to other structured knowledge work.

If this is you: This article is mostly for you. Keep reading.

Outside Tiers: Relationship and Accountability-Dependent

Who: Salespeople (trust-based), therapists (relationship-dependent), executive leaders (accountability-bearing), surgeons (physical + judgment)

What makes them outside: The core value isn't the output — it's the relationship, the physical presence, or the willingness to bear responsibility. People don't pay for tasks, they pay for trust, judgment, and responsibility.

AI status: AI augments supporting tasks (research, scheduling, documentation) but doesn't replace the core value proposition. Slower to enter the smooth frontier — but not immune.

The Sniff-Check Test: Where Do You Fall?

Here's the practical question from the Anthropic 2026 Agentic Coding Trends Report, adapted across professions: practitioners are delegating tasks they can "sniff-check on correctness" to AI agents. Can you do the same in your field?

The test is domain-specific:

Your ProfessionThe Sniff-Check Question
Product ManagerCan you read an AI-generated product strategy and immediately spot the missing competitive insight, the unrealistic timeline, the feature that won't serve users?
Marketing StrategistCan you look at an AI-generated campaign and know whether it will resonate — not because you can articulate every principle, but because you've seen 500 campaigns succeed and fail?
LawyerCan you scan an AI-drafted brief and feel where the reasoning is thin or the precedent is misapplied?
Financial AnalystCan you review an AI-built model and catch the assumption that doesn't match market reality?
DesignerCan you evaluate an AI-generated user flow and spot the friction points that will cause drop-off?

If you can pass the sniff-check: Your role shifts from producing this output to evaluating and directing AI output at scale. Your domain expertise becomes a force multiplier. You get more valuable, not less — as long as you adapt to the evaluator role.

If you can't yet: Either you're early in your career and need to build the domain expertise that makes sniff-checking possible, or your current work is primarily execution. Both have implications for what you should do next.

Why People Underestimate What's Coming

There's a systematic miscalibration happening across every Tier 2 profession. Most professionals judge AI capability based on ChatGPT conversations — single-turn interactions where you type a prompt, get a response, and evaluate it.

Single-turn AI has four failure modes that multi-agent systems eliminate:

  1. Error propagation — one mistake contaminates everything downstream (multi-agent: agents review each other)
  2. No retry — if the approach is wrong, there's no correction (multi-agent: iterate and backtrack)
  3. Context ceiling — can't accumulate information across sessions (multi-agent: shared memory and context)
  4. One-shot constraint — every problem must be solved in a single pass (multi-agent: decompose into subtasks)

When you test ChatGPT by asking it to write a legal brief and it produces something mediocre, you're testing single-turn AI. You're calibrating your career strategy against a capability snapshot that's already obsolete.

The structured multi-agent systems arriving in your profession's tooling won't have these constraints. And they're arriving faster than most professionals expect — because the deployment architecture already works. It just needs to be aimed at your domain.

What This Means For Your Career (By Tier)

If You're Tier 2 (Expert-Checkable Work)

The restructuring that hit software engineering is coming to your field within 12-24 months. Here's the playbook:

Your value shifts from production to evaluation. The analysts who build models will matter less than the analysts who can evaluate AI-built models and catch the wrong assumptions. The lawyers who draft briefs will matter less than the lawyers who can review AI-drafted briefs and spot the thin reasoning. The marketers who create campaigns will matter less than the marketers who can evaluate AI campaigns against 15 years of pattern recognition.

Depth beats breadth. The sniff-check only works if you have genuine expertise. Generalists who can produce adequate work across many areas are more replaceable than specialists who can evaluate excellent work in one area. This is the opposite of the "learn everything" advice that dominated the 2010s.

Start evaluating AI output now. Don't wait for your company to deploy AI tools. Use ChatGPT, Claude, or domain-specific AI to generate work products in your area. Practice evaluating them. Build the muscle of spotting what's wrong, what's missing, what's mediocre but passable. This is the skill that will define your value in 18 months.

If You're Outside Tiers (Relationship/Accountability Work)

Your core value is more durable, but don't get complacent. AI is already handling the supporting tasks around your work — research, scheduling, documentation, data analysis. The professionals who thrive will be the ones who offload routine work to AI and reinvest that time into deeper relationships, harder judgments, and higher-stakes accountability.

If You're Tier 1 (Machine-Checkable Work)

You're already in it. The question isn't "will this affect me" but "have I adapted?" If your primary value is still producing machine-verifiable output, you're competing directly with tools that work 24/7 at near-zero marginal cost. The exit ramp is becoming a Tier 2 evaluator — developing the domain expertise to direct AI rather than compete with it.

One Move This Week

Pick one work product you created recently — a report, a strategy doc, a campaign brief, a financial model, whatever represents your typical output.

Ask an AI tool (ChatGPT, Claude) to create the same thing from the same inputs.

Then evaluate: what did it get right? What did it miss? What would you have caught that a junior colleague might not?

That gap — between what AI produces and what your experience catches — is your career moat. If the gap is wide, you're positioned well. If it's narrow, you know what to work on.


How This Framework Applies To Specific Professions

We've applied the verifiability framework across our profession analyses. See how it plays out for your specific role:


Sources

  • Cursor CEO Michael Truell on multi-agent system generalization, March 2026
  • Anthropic 2026 Agentic Coding Trends Report — "sniff-check" delegation framework
  • Nate B Jones analysis of four-lab convergence (Anthropic, Google DeepMind, OpenAI, Cursor), March 11, 2026
  • Fields Medal "First Proof" benchmark results, March 2026