May 28, 2026·playwrightai-testingci

AI-authored Playwright tests, without an AI in your CI

When a tool says "AI testing," ask one question: is the model running when the test runs?

Some tools answer yes. A language model reads the steps on every run, looks at the page, decides what to click, and recovers when something moved. The test is a prompt that re-resolves itself each time. Hover answers no. A model explores your app once, works out the flow, and writes ordinary test code. After that the model is gone, and a deterministic script runs the same way every time.

That yes-or-no decides what your test suite costs you for the rest of its life.

Runtime AI is a bill that arrives forever

Put a model on the path that runs when the test runs, and you take on three costs.

You pay on every run. Each PR, each nightly, each merge-queue retry makes LLM calls. A suite that runs hundreds of times a day across dozens of branches multiplies that bill, and it grows with how much you test. You wanted more coverage to be cheap; this makes it expensive.

Runs get slower. A model that looks at a screenshot and decides what to click takes longer than a selector resolving against the DOM, and a vision model is slower still.

Your green build depends on someone else's uptime. A rate limit, a regional outage, or a model version your provider sunsets can turn a passing suite red while nobody touches your code.

None of that is a worst case. That is what keeping inference on the hot path costs you.

Authoring-time AI spends the model once

Hover is an MCP server you add to the coding agent you already run. You ask it to test a flow, "log in, then add a todo named verify hover," and the agent drives your real Chrome through Hover's grounded tools over CDP. When the run is verified, it calls crystallize_spec and Hover writes a standard @playwright/test file:

import { test, expect } from '@playwright/test';

test('login then add todo', async ({ page }) => {
  await page.goto('http://localhost:5173/');
  await page.getByLabel('Email').fill('claude@example.com');
  await page.getByRole('button', { name: 'Sign in' }).click();
  await page.getByPlaceholder('New todo').fill('verify hover');
  await expect(page.getByText('verify hover')).toBeVisible();
});

No agent lives in that file. It runs with npx playwright test, in your CI, with zero tokens and no call to any provider. The model did its work at the one moment it helped: figuring out the flow. What it left behind is ordinary Playwright.

"But selectors break when the UI changes"

Fair, and you should ask it of any AI-authored test. Hover handles it without putting a model on the hot path.

Most churn never reaches the spec. Hover writes semantic locators (getByRole, getByLabel, getByText), so a layout refactor leaves "the Sign in button" as "the Sign in button."

When a name changes, you re-record on purpose. One click replays the original prompt against the current UI and rewrites the file. Thirty seconds, ten cents, and you read the diff before committing.

When the flow itself broke, the red build is the point. The test caught a regression. Fix the app.

The difference from runtime self-healing is when you pay and how often. You pay once, on purpose, when something changed, not on every green run for the life of the suite.

When you do want the model at runtime

If your DOM is unstable, or you're testing something you don't control and can't re-record, a model in the loop buys resilience a static script can't. That's a real trade, worth making sometimes.

For your own app, on your own dev server, with a flow you can re-record in seconds, paying for inference on every CI run buys you nothing. Author with AI. Run plain Playwright.

See how Hover crystallises a session into a spec →

Try Hover on your own app.

Add Hover’s MCP to the coding agent you already run. It explores your app and crystallizes plain Playwright specs you own.

npm i -g @hover-dev/mcp && claude mcp add hover -- hover-mcpRead the quick start →