The Screenshot Loop

This is the third tool I’ve reached for trying to make this workflow click. I started with the Playwright MCP server, where Claude could see the page but every interaction came back as a full accessibility tree. A single “make this look closer to the mockup” turn would burn a small mountain of tokens before any code got edited. Great for testing behavior, wrong tool for visual iteration. From there I moved to Simon Willison’s rodney, a little CLI built for exactly this kind of thing, and that’s what convinced me the loop was even possible. The last few days I’ve landed on Microsoft’s playwright-cli — same browser automation, but it adds snapshots, refs, named sessions, and a live screencast dashboard that’s quietly become my main view.

The skill I built on top of it is screenshot-loop. The whole idea is in the name: take a screenshot, compare to the brief, fix the biggest gap, repeat. I drop in a mockup or write a one-liner like “make this dashboard look like mockup.png, keep iterating until it matches”, and Claude runs the loop. Opens the page, screenshots it, holds it up against the brief. Picks the biggest visual gap (usually something structural, before anything cosmetic) and edits the CSS. Reloads, screenshots again. Keeps going until things either match or stop improving, then it stops on its own and shows me where it landed.

Stopping is the hard part. There’s a milestone system and a “three iterations without visible progress” cap baked in, so the loop doesn’t spin on a 2px shadow for ten turns. The skill pauses at structural checkpoints — say, layout’s done, or responsive’s working — to show me a screenshot and ask if we’re headed somewhere good. If it gets stuck before hitting one, it bails and asks for help.

Each loop picks a session ID up front, which keeps agents from stepping on each other when there’s more than one running in the same repo. playwright-cli show opens a dashboard where I can watch what’s happening live and click in to take over the browser if I want. When I’m running two or three agents on different branches at once,11 This happens more than I’d like to admit. The dashboard is the only reason I haven’t yet sent two agents to fight each other over the same dev server port. that view is the only thing keeping me sane.

A nice side effect: the snapshot output includes roles and names, so Claude reviews the a11y of the thing it just built every iteration. I used to do that as a final pass and find five problems stacked up at once. Now they get caught two seconds after they get introduced.

The loop is wrong for behavioral work. If the brief is “this form should validate on blur and show errors inline,” screenshots won’t tell you whether it’s working. It also doesn’t help without a brief, because the whole approach is comparison. Nothing to converge on, nothing for the agent to do.

I keep coming back to the framing from an earlier post: LLMs automate typing, not thinking. The screenshot loop is the version of that I want for UI work — Claude handles the screenshots, the reloads, the what’s-different-now observations. I handle the taste.

The skill itself

If you want to drop this into your own setup, here’s the full SKILL.md (with my project-specific bits stripped out). Save it to ~/.claude/skills/screenshot-loop/SKILL.md and Claude will pick it up.

markdown

---
name: screenshot-loop
description: Use when building or modifying visual interfaces and you want to verify by screenshotting the running app, comparing against a brief (image, text, Figma URL, or live URL), and iterating until it matches. Triggers on UI implementation tasks where visual accuracy matters.
---

# Screenshot Loop

Take a screenshot, compare to the brief, fix the biggest gap, repeat — with milestone check-ins and a11y review baked in.

## When to Trigger

Use when there's a visual target (mockup, screenshot, description, live reference) and visual verification matters. Skip for backend-only work, purely behavioral briefs (use tests), or trivial copy changes.

## Setup

If `command -v playwright-cli` fails:

```bash
npm install -g @playwright/cli@latest
playwright-cli install --skills
```

## Briefs

| Format | Handling |
|---|---|
| **Image file** | Read as visual reference. Cache to `brief.png` if reused. |
| **Text description** | Use as evaluation criteria each iteration. |
| **Figma URL** | `goto` the frame URL, screenshot to `brief.png`, then run the loop against your dev URL with that as the target. (If Figma asks for auth, see the auth phase — the same `state-save`/`state-load` pattern works on figma.com.) |
| **Live URL** | `tab-new <url>`, screenshot to `brief.png`, then loop your dev URL against it. |

Briefs can combine (e.g., screenshot + text describing behavior).

## Sessions

Multiple agents may share a repo. Pick a unique session ID at loop start and pass `-s=$SESSION` to every command:

```bash
SESSION="$(basename "$PWD")-$(openssl rand -hex 2)"
```

## Directories

```
/tmp/screenshot-loop/$SESSION/    # ephemeral: current.png, snap.yml, milestone-*.png, brief.png
~/.cache/screenshot-loop/auth/    # persistent auth state per repo
```

Create at loop start:

```bash
mkdir -p /tmp/screenshot-loop/$SESSION ~/.cache/screenshot-loop/auth
```

Never save screenshots into the project directory.

## The Process

### Phase 1: Authentication

1. Skim `CLAUDE.md` for dev-login endpoints, seeded test users, or env vars.
2. Try existing auth state. If `~/.cache/screenshot-loop/auth/$(basename "$PWD").json` exists:

   ```bash
   playwright-cli -s=$SESSION state-load ~/.cache/screenshot-loop/auth/$(basename "$PWD").json
   playwright-cli -s=$SESSION goto <target-url>
   playwright-cli -s=$SESSION screenshot --filename=/tmp/screenshot-loop/$SESSION/current.png
   ```

   If past the login gate, skip to the loop.

3. Authenticate fresh, in order: dev-login endpoint > form fill via snapshot refs > ask the user after 2 failed attempts.
4. Save state once authenticated:

   ```bash
   playwright-cli -s=$SESSION state-save ~/.cache/screenshot-loop/auth/$(basename "$PWD").json
   ```

### Phase 2: The Loop

Each iteration:

1. Screenshot to `/tmp/screenshot-loop/$SESSION/current.png`.
2. Snapshot to `/tmp/screenshot-loop/$SESSION/snap.yml` for refs and a11y info.
3. Compare to brief. Fold in a11y review using snapshot roles/names: labels, contrast, focus order, semantics, alt text.
4. Fix the biggest gap. Structural before cosmetic.
5. Reload, screenshot, repeat.

**Refs, not selectors.** `snapshot` first, then act on refs (`click e21`, `fill e21 "text"`). Don't reach for CSS.

### Phase 3: Milestones

Check in after each: layout in place, colors and typography applied, responsive working, interactive states handled, final polish. Save a milestone screenshot:

```bash
playwright-cli -s=$SESSION screenshot --filename=/tmp/screenshot-loop/$SESSION/milestone-layout.png
```

Show it alongside the brief and ask if it's on track.

### Phase 4: Convergence Cap

If 3 iterations pass without visible progress, stop and check in. Don't spin 8 times on a 2px shadow.

### Phase 5: Pushback

Stop and raise concerns when the brief has a11y problems, conflicts with codebase patterns, would work better implemented differently, or is ambiguous enough to waste a cycle.

## Debugging

If the screenshot is broken (blank, missing assets, JS error) rather than off-spec:

```bash
playwright-cli -s=$SESSION console            # JS errors
playwright-cli -s=$SESSION requests           # network log
playwright-cli -s=$SESSION request <index>    # request detail
```

Common cause for Vite frontends: `npm run dev` isn't running.

## Responsive

`playwright-cli -s=$SESSION resize <w> <h>` switches viewport. Capture a milestone at each breakpoint.

## Stop

End the loop when the screenshot matches the brief, the user says good enough, or a blocking question needs an answer.

## Cleanup

```bash
playwright-cli -s=$SESSION close
rm -rf /tmp/screenshot-loop/$SESSION
```

The auth file stays.

## Common Mistakes

| Mistake | Fix |
|---|---|
| Missing `-s=$SESSION` | Every invocation needs it, or you join another agent's session. |
| Using CSS selectors | Refs come from `snapshot`. |
| Re-authing every loop | Use `state-save`/`state-load` once. |
| Fixing details before layout | Biggest gap first. |
| Screenshots in cwd | Always `/tmp/screenshot-loop/$SESSION/`. |
| Skipping milestones | This is where misunderstandings get caught early. |