AI-authored Playwright tests, without an AI in your CI
When a tool says "AI testing," ask one question: is the model running when the test runs?
Some tools answer yes. A language model reads the steps on every run, looks at the page, decides what to click, and recovers when something moved. The test is a prompt that re-resolves itself each time. Other tools, Hover among them, answer no. A model explores your app once, works out the flow, and writes ordinary test code. After that the model is gone, and a deterministic script runs the same way every time.
That single yes-or-no decides what your test suite costs you for the rest of its life.
Runtime AI is a bill that arrives forever
Put a model on the path that runs when the test runs, and three things follow you around.
You pay on every run. Each PR, each nightly, each merge-queue retry makes LLM calls. A suite that runs hundreds of times a day across dozens of branches multiplies that bill, and it grows with how much you test. You wanted more coverage to be cheap; this makes it expensive.
Runs get slower. A model looking at a screenshot and deciding what to click takes longer than a selector resolving against the DOM. Vision models take longer still.
Your green build now depends on someone else's uptime. A rate limit, a regional outage, or a model version your provider sunsets can turn a passing suite red without anyone touching your code.
None of that is a worst case. It's what keeping inference on the hot path means.
Authoring-time AI spends the model once
Hover puts a chat widget in your dev server. You type a flow in plain English, "log in, then add a todo named verify hover," and the agent drives your real Chrome through it over CDP while you watch. When the run looks right, you click Save as spec, and Hover writes a standard @playwright/test file:
import { test, expect } from '@playwright/test';
test('login then add todo', async ({ page }) => {
await page.goto('http://localhost:5173/');
await page.getByLabel('Email').fill('claude@example.com');
await page.getByRole('button', { name: 'Sign in' }).click();
await page.getByPlaceholder('New todo').fill('verify hover');
await expect(page.getByText('verify hover')).toBeVisible();
});
No agent lives in that file. It runs with npx playwright test, in your CI, with zero tokens and no call to any provider. The model did its work at the one moment intelligence helped: figuring out the flow. The artifact it left behind is ordinary Playwright.
"But selectors break when the UI changes"
Fair, and you should ask it of any AI-authored test. Hover handles it without putting a model on the hot path.
Most churn never reaches the spec. Hover writes semantic locators (getByRole, getByLabel, getByText), so a layout refactor leaves "the Sign in button" as "the Sign in button."
When a name changes, you re-record on purpose. One click replays the original prompt against the current UI and rewrites the file. Thirty seconds, ten cents, and you read the diff before committing.
When the flow itself broke, the red build is the point. That's a regression the test caught. Fix the app.
The difference from runtime self-healing is when you pay and how often. You pay once, on purpose, when something changed. You don't pay on every green run for the life of the suite.
When you do want the model at runtime
If your DOM is unstable, or you're testing something you don't control and can't re-record, a model in the loop buys resilience a static script can't. That's a real trade, worth making sometimes.
For your own app, on your own dev server, with a flow you can re-record in seconds, paying for inference on every CI run buys you nothing. Author with AI. Run plain Playwright.
Try Hover on your own app.
One command adds the widget to your dev server. Author tests with AI, ship plain Playwright.
npx @hover-dev/cli setup