Open Source · MIT Licensed

AI agents that demo
your web app like a human

Describe what to test in plain English. Vigil launches a browser, reasons through your UI with an LLM, and produces video recordings with structured pass/fail reports.

Get Started View on GitHub

vigil — ~/project

# Describe what to test — the AI agent does the rest
$ vigil demo "test the dark mode toggle and verify all elements update" \
    --url http://localhost:3737 --headed
▸ Launching browser...
▸ Agent observing page (accessibility tree + DOM snapshot)
▸ Step 1/5: Clicking dark mode toggle button
▸ Step 2/5: Verifying background color changed to dark theme
▸ Step 3/5: Checking navigation bar updated
▸ Step 4/5: Verifying text contrast meets expectations
▸ Step 5/5: Toggling back to light mode and confirming
✓ All checks passed (5/5 steps, 12.3s)
▸ Video: recordings/run-a1b2c3/video/
▸ Report: results/run-a1b2c3/results.md

How It Works

Three steps. Zero test scripts.

No selectors to maintain. No brittle page objects. Describe the task, point at a URL, and let the agent figure it out.

Describe the task

Write a natural language task like "log in and add a high-priority todo" or define YAML steps for repeatable scenarios.

Agent reasons & acts

Vigil launches a browser, reads the DOM and accessibility tree, and uses an LLM to decide what to click, type, and verify.

Get recordings & reports

Every run produces video recordings, Playwright traces, screenshots, and structured JSON/Markdown reports with pass/fail status.

Execution Modes

Freeform or structured. Your call.

Autonomous

Freeform Mode

Describe what you want in plain English. The agent figures out the steps, navigates pages, fills forms, and validates results on its own.

$ vigil demo "search for wireless headphones
and add the first result to cart" \
--url https://shop.example.com --headed

Deterministic

Scenario Mode

Define explicit YAML step-by-step workflows for repeatable demos. Ideal for CI pipelines, regression testing, and stakeholder reviews.

name: Login Flow Demo
baseUrl: http://localhost:3737
steps:
  - action: click
    target: "the login button"
  - action: type
    value: "user@example.com"

Features

Everything you need to ship with confidence

Multi-LLM Support

Claude, GPT-4, Gemini, Groq, OpenRouter, Azure, Ollama, or any OpenAI-compatible endpoint. Use what you already pay for.

Rich Recordings

Video recordings, Playwright traces, per-step screenshots, and animated cursor effects. Show stakeholders exactly what happened.

Structured Reports

JSON and Markdown reports with step-by-step results, agent reasoning, timings, and error details. Machine-readable and human-friendly.

CI/CD Ready

Ships with a GitHub Actions workflow. Run demos on every PR, upload artifacts, and post summary comments automatically.

Observe-First Agent

The agent reads the DOM, accessibility tree, and page context before acting. Surgical, reasoning-driven interaction — not blind clicking.

Web UI Dashboard

Built-in web interface to browse scenarios, run demos, watch real-time progress via SSE, and inspect results — no terminal needed.

Quick Start

Running in four commands

Install and set up

$ git clone https://github.com/madxjack/QArmy && cd QArmy
$ npm install && npx playwright install chromium
$ npm run build

Set your LLM API key

$ export ANTHROPIC_API_KEY=sk-ant-...
# or OPENAI_API_KEY, GEMINI_API_KEY, GROQ_API_KEY

Run the built-in demo

$ npx vigil run-all scenarios/ --serve --headed
# Starts the demo app + runs all 17 scenarios

Or just describe what to test

$ npx vigil demo "test the login flow" \
--url https://your-app.com --headed

AI agents that demoyour web app like a human