Open Source · MIT Licensed

AI agents that demo
your web app like a human

Describe what to test in plain English. Vigil launches a browser, reasons through your UI with an LLM, and produces video recordings with structured pass/fail reports.

Get Started View on GitHub
vigil — ~/project
# Describe what to test — the AI agent does the rest
$ vigil demo "test the dark mode toggle and verify all elements update" \
    --url http://localhost:3737 --headed
▸ Launching browser...
▸ Agent observing page (accessibility tree + DOM snapshot)
▸ Step 1/5: Clicking dark mode toggle button
▸ Step 2/5: Verifying background color changed to dark theme
▸ Step 3/5: Checking navigation bar updated
▸ Step 4/5: Verifying text contrast meets expectations
▸ Step 5/5: Toggling back to light mode and confirming
✓ All checks passed (5/5 steps, 12.3s)
▸ Video: recordings/run-a1b2c3/video/
▸ Report: results/run-a1b2c3/results.md
Works with / Claude / GPT-4 / Gemini / Groq / Ollama / + more
How It Works

Three steps. Zero test scripts.

No selectors to maintain. No brittle page objects. Describe the task, point at a URL, and let the agent figure it out.

1

Describe the task

Write a natural language task like "log in and add a high-priority todo" or define YAML steps for repeatable scenarios.

2

Agent reasons & acts

Vigil launches a browser, reads the DOM and accessibility tree, and uses an LLM to decide what to click, type, and verify.

3

Get recordings & reports

Every run produces video recordings, Playwright traces, screenshots, and structured JSON/Markdown reports with pass/fail status.

Execution Modes

Freeform or structured. Your call.

Autonomous

Freeform Mode

Describe what you want in plain English. The agent figures out the steps, navigates pages, fills forms, and validates results on its own.

$ vigil demo "search for wireless headphones
  and add the first result to cart"
\
    --url https://shop.example.com --headed
Deterministic

Scenario Mode

Define explicit YAML step-by-step workflows for repeatable demos. Ideal for CI pipelines, regression testing, and stakeholder reviews.

name: Login Flow Demo
baseUrl: http://localhost:3737
steps:
  - action: click
    target: "the login button"
  - action: type
    value: "user@example.com"
Features

Everything you need to ship with confidence

Multi-LLM Support

Claude, GPT-4, Gemini, Groq, OpenRouter, Azure, Ollama, or any OpenAI-compatible endpoint. Use what you already pay for.

Rich Recordings

Video recordings, Playwright traces, per-step screenshots, and animated cursor effects. Show stakeholders exactly what happened.

Structured Reports

JSON and Markdown reports with step-by-step results, agent reasoning, timings, and error details. Machine-readable and human-friendly.

CI/CD Ready

Ships with a GitHub Actions workflow. Run demos on every PR, upload artifacts, and post summary comments automatically.

Observe-First Agent

The agent reads the DOM, accessibility tree, and page context before acting. Surgical, reasoning-driven interaction — not blind clicking.

Web UI Dashboard

Built-in web interface to browse scenarios, run demos, watch real-time progress via SSE, and inspect results — no terminal needed.

Quick Start

Running in four commands

1

Install and set up

$ git clone https://github.com/madxjack/QArmy && cd QArmy
$ npm install && npx playwright install chromium
$ npm run build
2

Set your LLM API key

$ export ANTHROPIC_API_KEY=sk-ant-...
# or OPENAI_API_KEY, GEMINI_API_KEY, GROQ_API_KEY
3

Run the built-in demo

$ npx vigil run-all scenarios/ --serve --headed
# Starts the demo app + runs all 17 scenarios
4

Or just describe what to test

$ npx vigil demo "test the login flow" \
    --url https://your-app.com --headed

Start watching.

Vigil is free, open-source, and works with the LLM you already use.

Star on GitHub