← All posts
·vibe-codingtestingplaywrightai

You vibe-coded the feature. Is it actually tested?

You typed a sentence. Twenty minutes later there's a working checkout flow, a settings page, a new dashboard widget. The AI wrote it, you read enough of the diff to trust it, and you shipped. This is the part of vibe-coding that genuinely delivers. The friction of building has mostly gone away.

Then you open the app to check it. You click through the new flow once, it does the thing, and you move on. That single pass is the entire test plan. It felt fine, so it's done.

Shipping got easy. Confidence didn't.

The problem shows up the third or fourth time you do this. Every feature you add is one more path a user can take, and every change you make can quietly break a path you added last week. The flow you clicked through on Monday is now downstream of three things you changed on Thursday. You didn't re-click it, because there are eleven other flows and you'd be clicking all day.

So you stop checking. Not as a decision, just as the natural result of manual verification scaling with the square of your feature count while your patience stays flat. The first regression you hear about comes from a user, and it's in a flow that worked the day you built it.

Manual clicking and hand-written specs both fail here

The obvious answer is automated tests. Write Playwright specs, run them in CI, catch the regression before the user does. This is the right idea and it runs straight into a wall.

You did not hand-write the feature. You are not going to hand-write forty specs to cover it. Authoring a Playwright test means learning the selectors, the waits, the assertions, the setup, and doing that carefully for code you never typed in the first place. The economics are upside down. The whole reason you reached for vibe-coding was to skip the slow part, and writing tests by hand puts the slow part right back, except now it's the boring slow part.

The two options on the table are manual clicking, which doesn't scale, and hand-written specs, which you won't do. So most vibe-coded apps end up with neither. They ship on vibes and break on contact.

Vibe-test it the same way you vibe-coded it

The fix is to test the way you built. You described the feature in plain English to get it. Describe the flow in plain English to verify it.

That's what Hover does. It's a free, open-source VS Code extension. You write something like "log in, add a product to the cart, check out with a saved card," and the agent drives your real Chrome to do exactly that. It uses the claude or codex CLI you already have on your machine, connected to your actual browser over CDP. No new account, no separate test harness, no headless approximation of your app. It runs the flow in the same browser you'd use to click through it yourself, so what it verifies is what a user would hit.

You watch it go. If the flow works, you keep it. If it doesn't, you've found the bug before a user did, which was the entire point.

What you end up with

Here's the part that matters. When the run succeeds, Hover crystallizes it into a plain @playwright/test spec file. Not a recording tied to Hover, not a proprietary format, not something that needs an AI to replay. Standard Playwright code that you can read, edit, and commit.

That spec runs in CI forever with zero AI in the loop. The agent did the authoring once, the same way you'd author a test by hand if authoring tests by hand were fast. After that it's just a Playwright suite. No API calls, no tokens burned per run, no nondeterminism from a model deciding what to click this time. We've written more about why AI should author the test but never run it in CI, and that line is the whole design.

So the loop closes. You vibe-code a feature, you vibe-test it, and you walk away with a real spec that guards the flow on every push. Adding the eleventh feature no longer means re-clicking the other ten. CI does that now.

Testing is the first cost vibe-coding hides. The next two are keeping the app working as it grows (why vibe-coded apps keep breaking) and knowing it's safe to ship (the security holes vibe-coding leaves behind). Building fast is solved. The next question is whether you can trust what you built, and that's where the rest of the work is.

Try Hover on your own app.

Install the VS Code extension. Author tests with AI, ship plain Playwright.

Install on VS Code Marketplace →