Hover is an open-source VS Code extension that turns plain-English chat into end-to-end tests. AI drives your real Chrome once to explore a flow, then Hover crystallizes the verified run into a standard @playwright/test spec that runs in CI with no AI in the loop.

How is Hover different from other AI testing tools?

Other AI test tools keep a model in the loop at runtime and re-generate the test on every run, so CI keeps paying for tokens and results drift. Hover spends the model once, at authoring time, and the artifact it leaves behind is deterministic, human-readable @playwright/test code. Green builds never pay a recurring AI tax.

What does Hover cost to run?

Hover is free and open source. It bundles no model SDK and no API keys — it spawns the coding-agent CLI (Claude Code or OpenAI Codex) already on your PATH, running on your own subscription or API key. There is no per-token resale.

Can Hover do security testing?

Yes. The same chat flips into an API-testing mode (IDOR / authz probing that crystallizes confirmed findings into .api-test.spec.ts CI gates) and a pentest mode (offensive, white-box, own-app-only — SQLi / XSS / SSTI / SSRF — writing a findings report).

← All posts

Jun 16, 2026·vibe-codingtestingplaywrightai

You vibe-coded the feature. Is it actually tested?

You typed a sentence. Twenty minutes later there's a working checkout flow, a settings page, a new dashboard widget. The AI wrote it, you read enough of the diff to trust it, and you shipped. This is the part of vibe-coding that genuinely delivers. The friction of building has mostly gone away.

Then you open the app to check it. You click through the new flow once, it does the thing, and you move on. That single pass is the entire test plan. It felt fine, so it's done.

Shipping got easy. Confidence didn't.

The problem shows up the third or fourth time you do this. Every feature you add is one more path a user can take, and every change you make can quietly break a path you added last week. The flow you clicked through on Monday is now downstream of three things you changed on Thursday. You didn't re-click it, because there are eleven other flows and you'd be clicking all day.

So you stop checking. Not as a decision, just as the natural result of manual verification scaling with the square of your feature count while your patience stays flat. The first regression you hear about comes from a user, and it's in a flow that worked the day you built it.

Manual clicking and hand-written specs both fail here

The obvious answer is automated tests. Write Playwright specs, run them in CI, catch the regression before the user does. This is the right idea and it runs straight into a wall.

You did not hand-write the feature. You are not going to hand-write forty specs to cover it. Authoring a Playwright test means learning the selectors, the waits, the assertions, the setup, and doing that carefully for code you never typed in the first place. The economics are upside down. The whole reason you reached for vibe-coding was to skip the slow part, and writing tests by hand puts the slow part right back, except now it's the boring slow part.

The two options on the table are manual clicking, which doesn't scale, and hand-written specs, which you won't do. So most vibe-coded apps end up with neither. They ship on vibes and break on contact.

Vibe-test it the same way you vibe-coded it

The fix is to test the way you built. You described the feature in plain English to get it. Describe the flow in plain English to verify it.

That's what Hover does. It's a free, open-source VS Code extension. You write something like "log in, add a product to the cart, check out with a saved card," and the agent drives your real Chrome to do exactly that. It uses the claude or codex CLI you already have on your machine, connected to your actual browser over CDP. No new account, no separate test harness, no headless approximation of your app. It runs the flow in the same browser you'd use to click through it yourself, so what it verifies is what a user would hit.

You watch it go. If the flow works, you keep it. If it doesn't, you've found the bug before a user did, which was the entire point.

What you end up with

Here's the part that matters. When the run succeeds, Hover crystallizes it into a plain @playwright/test spec file. Not a recording tied to Hover, not a proprietary format, not something that needs an AI to replay. Standard Playwright code that you can read, edit, and commit.

That spec runs in CI forever with zero AI in the loop. The agent did the authoring once, the same way you'd author a test by hand if authoring tests by hand were fast. After that it's just a Playwright suite. No API calls, no tokens burned per run, no nondeterminism from a model deciding what to click this time. We've written more about why AI should author the test but never run it in CI, and that line is the whole design.

So the loop closes. You vibe-code a feature, you vibe-test it, and you walk away with a real spec that guards the flow on every push. Adding the eleventh feature no longer means re-clicking the other ten. CI does that now.

Testing is the first cost vibe-coding hides. The next two are keeping the app working as it grows (why vibe-coded apps keep breaking) and knowing it's safe to ship (the security holes vibe-coding leaves behind). Building fast is solved. The next question is whether you can trust what you built, and that's where the rest of the work is.

Try Hover on your own app.

Install the VS Code extension. Author tests with AI, ship plain Playwright.

Install on VS Code Marketplace →