FAQ

My UI changed and my saved spec breaks. What now?

This is the central question for any AI-authored e2e test. Hover's answer is three-layered.

1. Most UI churn doesn't break the spec

Hover generates getByRole / getByLabel / getByTestId semantic selectors — never CSS classes or XPath. "Submit button" stays "Submit button" after a layout pass; the spec keeps running. We make this choice in packages/core/src/specs/writeSpec.ts and reinforce it in the system prompt the agent reads on every run.

2. When the semantics shift — button renamed, label changed, role swapped — the spec turns red

You have three options, listed from cheapest to most explicit:

  • Re-record it. Open the widget's 📜 Saved sessions overlay → Specs tab → click ⟳ Re-record. Or from a terminal: pnpm hover re-record <spec>. The agent reads the spec's JSDoc Original prompt: header ("log in then add a todo") and replays it against the current UI, then overwrites the .spec.ts with new selectors. About 30 seconds, about $0.10 per spec. Review the git diff before committing. See the Re-record a spec page for the full walkthrough.
  • Edit by hand. The spec is plain @playwright/testgetByRole('button', { name: 'Submit' })'Sign in'. Faster if you know exactly what changed.
  • Treat it as a regression. If the test fails because the flow broke (not just the selector), that's the test catching a real bug — fix the app, not the spec.

3. Why we don't auto-heal at CI time

The Stagehand / Midscene model: tests "self-heal" by calling an LLM mid-run, retrying with new selectors until they pass. It works, but it builds a permanent runtime dependency on a hosted AI provider — every CI run pays an LLM call, every PR, every nightly. Across a year of CI cycles that's measurable money and a fragility surface (provider rate limits, regional outages, model deprecations).

Hover takes the opposite position: AI is for authoring tests, not running them. The saved .spec.ts is plain Playwright — pnpm test:e2e is deterministic and free. When the UI changes enough that selectors break, you trigger Re-record once, deliberately, and the new spec is again deterministic and free. The token cost concentrates at the moment you actually need a model, not amortised across thousands of regression runs.

My button is still in the DOM but moved behind a kebab menu — does the spec catch that?

Yes — and the how is more nuanced than "we added a check that wasn't there." Playwright's .click() / .fill() / .hover() / .selectOption() / .dblclick() all auto-wait on actionability, which includes visibility — so even the old emit (await page.getByRole(...).click()) wouldn't have silently fired on a hidden element. It would have timed out after 30 seconds with a generic actionability error that reads like a flake.

The visibility prelude Hover now emits is a fast, semantically-clearer failure rather than net-new detection:

// Hover emits this:
{
  const el = page.getByRole('button', { name: 'Submit' });
  await expect(el).toBeVisible();
  await el.click();
}

When the button drifts into a closed <details> / kebab / drawer, the toBeVisible() line fails in ~5 s with Locator expected to be visible — a category triage engineers immediately recognise as a UI regression rather than dismiss as a network blip. The same drift on the old emit would have stalled for 30 s and produced a Timeout 30000ms exceeded ... element is not visible actionability error that's easy to mis-classify as flake.

Net change:

Before (just .click())After (prelude + .click())
DetectionCaught (Playwright actionability)Caught (toBeVisible)
Time to fail~30 s~5 s
Error category"Timeout" — reads like flake"Locator expected to be visible" — reads like UI regression
Spec self-documents intentImplicit in .click()Explicit in code

The change applies to click / dblclick / hover / fill / selectOption. page.goto and page.keyboard.press are page-level (no element) and stay one-liners.

Credit: the gap was pointed out on X (and the framing sharpened in follow-up) — role-based locators catch this case via auto-wait, but the failure shape is poor. The prelude is on by default; spec authors don't have to remember to add it.

What this still doesn't fix:

  • disabled buttons — Playwright auto-waits for actionability there too, so .click() on a disabled control still times out at 30 s with a generic message. (Tighten by hand with await expect(el).toBeEnabled() where it matters.) Same shape of failure as before for this case.
  • An intermediate step quietly disappearing from the flow — each step still passes individually; only by reading the JSDoc Steps: block can you tell something changed. The Re-record path handles this case (agent regenerates the full sequence against current UI).

Why no re-record --all or --failed?

Both rejected on purpose.

--all would re-record every spec under __vibe_tests__/. Sounds convenient — but it burns LLM tokens on specs that were perfectly fine. With 20 specs in the project and 3 actually broken, --all pays for 17 unnecessary agent runs. It also produces git-diff noise across the 17 that don't need changing: same intent, different agent-chosen selector style, still a diff you have to review.

--failed is the right shape of the answer — only re-record specs that Playwright reports as failing — and remains on the roadmap for a future release. It needs a first-class run-Playwright-and-collect-failures step the CLI doesn't yet ship.

The pattern is: CI tells you which specs are red, you re-record them one at a time and review each diff. Slightly slower, much cleaner history.

Security spec auth setup — how do I run a security spec in CI when the auth cookies live in my debug Chrome?

The agent recorded the IDOR / authz probes with the cookies from your logged-in debug-Chrome session. Playwright in CI is a fresh process — it doesn't have those cookies. Plug them in via Playwright's storageState mechanic:

  1. Add an auth-setup step to your playwright.config.ts:

    projects: [
      { name: 'setup', testMatch: /global\.setup\.ts/ },
      {
        name: 'security',
        testMatch: /\.security\.spec\.ts/,
        dependencies: ['setup'],
        use: { storageState: '.auth/user.json' },
      },
    ],
    
  2. In global.setup.ts, log in once (via API or UI) and write the resulting cookies to .auth/user.json with await context.storageState({ path: '.auth/user.json' }).

  3. CI now runs your security spec with the same effective auth as Hover recorded.

Same pattern Playwright uses for UI-level e2e auth — see the official docs for the full reference. The Hover spec works as long as the request fixture has the storageState; the generated spec doesn't try to authenticate on its own.

Will Hover spawn another headless Chromium? My CI is already busy.

No. @hover-dev/core launches one isolated debug Chrome under <tmpdir>/hover-chrome and connects via CDP. It never spawns a fresh Chromium per command, and it doesn't touch your CI's Playwright browsers — those are configured entirely in playwright.config.ts and unrelated to Hover's debug Chrome.

Does Hover send my source code or DOM to a hosted service?

No. Hover spawns the coding-agent CLI on your local PATH (claude, codex, cursor-agent, etc.) and that CLI talks to its own provider (Anthropic, OpenAI, Cursor). @hover-dev/core itself has no LLM SDK code, no telemetry, no upload path. The Node service binds to 127.0.0.1 only and refuses connections from any other interface.

Why doesn't the widget show up in astro build / next build / vite build output?

All bundler integrations are dev-only (apply: 'serve' for Vite, command === 'dev' for Astro, nuxt.options.dev for Nuxt, etc.). Production builds are no-ops by design. The Shadow-DOM widget is also marked data-hover="true" so any Playwright run against production HTML can filter it out with one selector.

Can I run Hover in CI to author new tests automatically?

You can — set HOVER_AGENT=claude --max-budget-usd 0.50 and write a CI job that POSTs a prompt — but it's an anti-pattern most of the time. Hover is built around the assumption that a human reviews each generated spec before committing it. Automated authoring without review tends to produce specs that pass once and then accumulate selector debt no one notices until they break.

The supported workflow is: a human runs Hover during development, saves verified sessions, commits the resulting deterministic specs. CI just runs Playwright.