Back to home

Documentation

Everything you need to install Ghost, understand its tools, and integrate it with your AI agent. Ghost turns any website into structured, callable MCP tools — no code, no cloud, no screenshots.

Quick Start

Get Ghost running in under 30 seconds. Three steps, no account required.

1

Install dependencies

Clone the repo and install packages. Node.js 20+ required.

git clone https://github.com/ajsai47/ghost.git
cd ghost && npm install
2

Build the Chrome extension (optional)

The extension enables visible browsing with login sessions. Ghost also works headlessly via Playwright without it.

cd packages/extension && node build.mjs

Then load in Chrome: chrome://extensions → Developer Mode → Load unpacked → packages/extension/dist/

3

Register with your MCP client

Works with Claude Code, Codex, Cursor, Roo Code, or any MCP-compatible client.

Claude Code
bash
claude mcp add ghost -- npx tsx ~/ghost/packages/mcp-server/src/index.ts

Set your Anthropic API key (required for tool generation):

echo "ANTHROPIC_API_KEY=sk-ant-..." > ~/ghost/.env

You are ready. Open your MCP client and try:

ghost_go("extract the top stories from hacker news")

Ghost navigates to the site, analyzes the DOM, generates typed extraction tools, executes them, and returns structured JSON — all in a single tool call.

Core Tools

Ghost ships with 7 primary tools that are always available. These are the building blocks for all web automation.

ghost_go

The primary tool. Tell it what you want in plain English and it executes immediately — navigating, extracting, clicking, searching, whatever the task requires. This is the only tool most users will ever need.

ParameterTypeDescription
instructionrequiredstringWhat you want to do, e.g. "go to hacker news and extract the top stories"
previewbooleanIf true, shows the execution plan without running it. Default: false (executes immediately).
Examples
text
// Navigate and extract
ghost_go("extract the top 30 stories from Hacker News with scores")

// Search the web
ghost_go("search for AI startups and extract the top results")

// Fill a form
ghost_go("go to example.com/contact and fill the form with name 'Alex', email 'alex@co.com'")

// Preview without executing
ghost_go("extract repos from github.com/trending", preview: true)

How it works: ghost_go uses fast-path pattern matching for common commands (navigate, extract, search) and falls back to an LLM decomposer for complex multi-step instructions. Steps execute sequentially with error recovery.

ghost_navigate

Navigate the browser to a specific URL. Automatically checks the tool registry cache and generates new tools if the site has not been visited before. Use this when you know the exact URL.

ParameterTypeDescription
urlrequiredstringThe URL to navigate to.
tab_idstringTarget a specific tab by ID (from ghost_tab_list). Optional.
Example
text
ghost_navigate("https://news.ycombinator.com")
// Returns: Navigated to news.ycombinator.com — 6 tools available (cache).

ghost_search

Search the web via the Exa API and return results with optional content snippets. Can auto-navigate to the top result and generate extraction tools in a single call.

ParameterTypeDescription
queryrequiredstringThe search query.
num_resultsnumberNumber of results to return. Default: 10.
type"auto" | "neural" | "instant"Search type. Default: auto.
contentsbooleanInclude text content snippets in results. Default: false.
category"news" | "company" | "research paper" | "tweet" | "personal site"Filter results by content category.
auto_navigatebooleanAutomatically navigate to the top result and generate tools. Default: false.
Example
text
ghost_search("MCP server implementations", auto_navigate: true, num_results: 5)

ghost_analyze

Manually trigger page analysis on the current browser tab. Generates typed MCP tools for the current page. Useful when you have already navigated somewhere and want to regenerate tools (for example, after scrolling to load more content).

ParameterTypeDescription
Example
text
ghost_analyze()
// Returns: Generated 8 tools for github.com:
// - github_extract_repository_stats (extract)
// - github_extract_files (extract)
// - github_click_tab (click)
// ...

ghost_do

Execute a browser automation task from a natural language instruction. Unlike ghost_go, this tool gives you a plan preview before execution by default — useful when you want to review what Ghost will do before it does it.

ParameterTypeDescription
instructionrequiredstringNatural language instruction, e.g. "go to hacker news and extract the top stories".
confirmbooleanIf true, execute the plan immediately. If false/omitted, return the plan for review.
Example
text
// Step 1: Preview the plan
ghost_do("navigate to github trending and extract repos")
// Returns: plan with steps [ghost_navigate, ghost_analyze, auto_extract]

// Step 2: Execute
ghost_do("navigate to github trending and extract repos", confirm: true)

ghost_setup

Diagnostic tool that checks your Ghost environment and provides step-by-step instructions to fix any issues. Run this if something is not working.

ParameterTypeDescription

Checks: Node.js version (≥20), ANTHROPIC_API_KEY, ~/.ghost/ directory, registry cache, Chrome extension connection, Playwright availability.

ghost_status

Get Ghost connection status, registered tools, and registry stats. On first use, shows a getting-started guide. For returning users, shows a compact status table.

ParameterTypeDescription
Output
text
## Ghost Status
| Component       | Status                           |
|-----------------|----------------------------------|
| Extension       | Connected                        |
| API Key         | Set                              |
| Registry        | 12 domains, 87 tools cached      |
| Session tools   | 6 site tools active              |
| Auth            | 3 saved sessions                 |

Dynamic Tools

When Ghost visits a website, it analyzes the DOM and auto-generates site-specific tools. These are typed MCP tools with CSS selectors, input schemas, and descriptions — all cached locally in ~/.ghost/registry/ for instant reuse.

How tools are generated

1

DOM Analysis

Ghost reads the live DOM, identifying interactive elements (buttons, links, forms), data blocks (tables, lists, cards), and navigation patterns.

2

Heuristic Generation

Initial tools are generated using fast heuristics based on element types, ARIA roles, and data attributes. Available in milliseconds.

3

LLM Refinement (background)

Claude Opus 4.6 refines the heuristic tools in the background using extended thinking — improving selectors, adding descriptions, and merging duplicates. Updates arrive automatically.

4

Registry Cache

Tools are saved as JSON in ~/.ghost/registry/{domain}/{pattern}.json. Every future visit loads from cache — zero generation cost.

Example generated tools

SiteGenerated ToolTypeWhat It Does
Hacker Newshackernews_extract_storiesextractExtract story titles, scores, users, and comment counts
Hacker Newshackernews_click_storyclickClick a story link to navigate to it
GitHubgithub_extract_repository_statsextractExtract stars, forks, description, language
GitHubgithub_click_tabclickClick a repo tab (Code, Issues, PRs, etc.)
Wikipediawikipedia_extract_articleextractExtract article content, sections, and references
Wikipediawikipedia_click_linkclickClick an internal wiki link
Amazonamazon_extract_productsextractExtract product name, price, rating, reviews
Any site{domain}_submit_searchformSubmit the search form with a query

Tool naming convention

Dynamic tools follow the pattern {domain_prefix}_{action}_{target}. The domain prefix is derived from the hostname (e.g., hackernews for news.ycombinator.com). Actions include extract, click, submit, scroll, and navigate.

Advanced Features

Macros

Chain multiple Ghost tools into reusable, multi-step workflows. Macros support variable interpolation, conditional execution, and looping.

ghost_macro_create
json
{
  "name": "hn_paginated_extract",
  "description": "Extract stories from multiple pages of Hacker News",
  "steps": [
    { "tool": "ghost_navigate", "args": { "url": "https://news.ycombinator.com" }, "output_name": "nav" },
    { "tool": "hackernews_extract_stories", "args": {}, "output_name": "page_data" },
    { "tool": "hackernews_click_more", "args": {}, "output_name": "next" }
  ],
  "is_loop": true,
  "max_iterations": 3
}
ParameterTypeDescription
namerequiredstringName for the macro tool (becomes callable like any other tool).
descriptionrequiredstringHuman-readable description of what the macro does.
stepsrequiredstring (JSON)JSON array of steps. Each step has tool, args, output_name, and optional condition.
is_loopbooleanIf true, steps repeat until condition fails or max_iterations. Default: false.
max_iterationsnumberMax loop iterations. Default: 10.

Variable interpolation: Use $input.param_name for user inputs and $prev.result.field for referencing previous step outputs.

Page Monitoring

Track pages for changes over time. Ghost navigates to the monitored URL, extracts data, diffs against the previous result, and records the history.

Monitor workflow
text
// Add a monitor
ghost_monitor_add(url: "https://news.ycombinator.com", schedule: "1h", notify: "on_change")

// Run a check (extracts data, diffs against baseline)
ghost_monitor_check()

// List all monitors
ghost_monitor_list()

// Remove a monitor
ghost_monitor_remove(id: "mon_abc123")

Schedules are manual, 5m, 1h, or 1d. All check history is stored locally in ~/.ghost/monitors/.

Auth Persistence

Save login sessions so Ghost can access authenticated pages across restarts. Cookies and localStorage are persisted locally.

Auth workflow
text
// After logging into a site in the Ghost browser:
ghost_auth_save(domain: "github.com")

// List saved sessions
ghost_auth_list()
// Returns: { domains: ["github.com", "notion.so"], count: 2 }

Multi-Tab Management

Open, switch between, and close browser tabs programmatically. Each tab auto-analyzes its page and generates tools independently.

Tab management
text
// Open a new tab
ghost_tab_open(url: "https://github.com/trending")

// List all tabs
ghost_tab_list()
// Returns: { active_tab: "tab_1", tabs: [...], count: 3 }

// Close a tab
ghost_tab_close(tab_id: "tab_2")

Performance and Quality Tools

Ghost includes built-in observability for monitoring tool health, execution speed, and cache efficiency.

ghost_speed

Performance dashboard with per-domain breakdown, cache hit rates, and speed comparison vs. screenshot agents. Use report: true for shareable markdown output, save: true to persist snapshots, and history: true for trend tracking.

ghost_quality

Quality scores, per-tool health breakdown (healthy / degraded / broken / unused), and execution metrics for all registry entries. Degraded tools trigger auto-regeneration.

ghost_analytics

Detailed tool usage analytics with filtering by domain and health tier. Sortable by calls, success rate, latency, or name.

File Downloads

Download files from any URL to the local filesystem.

ghost_download(url: "https://example.com/report.pdf", filename: "q4-report.pdf")
// Saved to ~/.ghost/downloads/q4-report.pdf

Architecture

Ghost is an MCP (Model Context Protocol) server that bridges AI agents and the web. It uses a dual-executor architecture with intelligent caching and self-healing capabilities.

System Overview

Data Flow
text
Claude Code / Codex / Any MCP Client
    | MCP/stdio
    v
Ghost MCP Server (packages/mcp-server/src/index.ts)
    |
    +--- Executor Router (picks fastest path)
    |    |
    |    +- Tier 1: API Replay    → ~226ms (direct HTTP, no browser)
    |    +- Tier 2: Browser Fetch → ~350ms (authenticated API via cookies)
    |    +- Tier 3: DOM Extract   → ~436ms (full browser with selectors)
    |
    +--- Registry Cache (~/.ghost/registry/)
    |    Cached tools as JSON — zero-cost on repeat visits
    |
    +--- Chrome Extension (WebSocket :3456)
    |    Visible browsing, login sessions, DOM analysis
    |
    +--- Playwright (headless fallback)
         Background tasks, parallel extraction

3-Tier Execution

Ghost automatically selects the fastest execution strategy for every request. No configuration needed.

Tier 1
API Replay
~226ms

Skips the browser entirely. Direct HTTP calls to discovered API endpoints. Known domains with public APIs (HN, Reddit, GitHub).

Tier 2
Browser Fetch
~350ms

Authenticated API calls using saved session cookies. Works behind logins without opening a full browser.

Tier 3
DOM Extraction
~436ms

Full browser with typed CSS selectors. Still 100x faster than Computer Use. Used when no API endpoint is available.

Self-Healing Selectors

When a website changes its DOM structure, Ghost detects degraded selectors via quality scoring. Tools with low success rates are automatically flagged and regenerated.

How it works:
  • Each tool has a quality score based on success rate, latency, and execution count
  • Tools scoring below 0.5 with >5 executions trigger auto-regeneration
  • When rollback fails, Opus 4.6 with extended thinking reasons about the DOM change and generates repaired CSS selectors
  • Maximum 3 automatic regenerations per tool to prevent loops
  • Quality scores and health tiers are visible via ghost_quality

Registry Structure

~/.ghost/ directory
text
~/.ghost/
├── registry/              # Cached tools (JSON per domain/pattern)
│   ├── news.ycombinator.com/
│   │   └── _news.json
│   ├── github.com/
│   │   └── _owner_repo.json
│   └── ...
├── auth/                  # Saved session cookies and localStorage
│   ├── github.com.json
│   └── ...
├── monitors/              # Page change tracking data
│   └── ...
├── downloads/             # Downloaded files
├── speed-history.json     # Performance snapshots
└── debug.log              # Server debug log

Project Structure

Source code layout
text
ghost/
├── packages/
│   ├── mcp-server/src/           # MCP server — the brain
│   │   ├── index.ts              # Entry point, all tool registrations
│   │   ├── executor.ts           # Executor router (API → Bridge → Playwright)
│   │   ├── playwright-executor.ts  # Headless browser, DOM analysis, tool execution
│   │   ├── api-replay-executor.ts  # Direct HTTP for API-backed tools
│   │   ├── tool-generator.ts     # LLM-powered tool refinement
│   │   ├── nl-decomposer.ts      # Natural language → tool steps
│   │   ├── registry.ts           # Local JSON registry management
│   │   ├── quality.ts            # Tool health scoring
│   │   ├── macro-executor.ts     # Multi-step workflow engine
│   │   ├── monitor-store.ts      # Page change tracking
│   │   └── domain-api-map.ts     # Known domain → public API mappings
│   ├── extension/src/            # Chrome extension (Manifest V3)
│   │   ├── background/           # Service worker, tool generation via Claude
│   │   ├── content/              # DOM analysis, tool execution in page
│   │   └── popup/                # Extension dashboard UI
│   └── shared/src/               # Shared types (GhostTool, WsProtocol, Registry)
└── BENCHMARKS.md                 # Performance benchmark results

Configuration

Ghost requires minimal configuration. One environment variable is needed for tool generation; everything else is optional.

Environment Variables

VariableRequiredDescription
ANTHROPIC_API_KEYYesYour Anthropic API key. Used for LLM-powered tool generation and natural language decomposition.
GHOST_DOWNLOAD_DIRNoCustom download directory. Default: ~/.ghost/downloads/

Set your API key in one of two ways:

Option 1: .env file
bash
echo "ANTHROPIC_API_KEY=sk-ant-..." > ~/ghost/.env
Option 2: Shell export
bash
export ANTHROPIC_API_KEY=sk-ant-...

MCP Client Registration

Ghost works with any MCP-compatible client. Here is how to register it with popular clients:

Claude Code
bash
claude mcp add ghost -- npx tsx ~/ghost/packages/mcp-server/src/index.ts
Generic MCP config (mcp.json)
json
{
  "mcpServers": {
    "ghost": {
      "command": "npx",
      "args": ["tsx", "/path/to/ghost/packages/mcp-server/src/index.ts"],
      "env": {
        "ANTHROPIC_API_KEY": "sk-ant-..."
      }
    }
  }
}

Chrome Extension Setup

The Chrome extension is optional but enables visible browsing with login sessions. Ghost uses Playwright as a fallback when the extension is not connected.

1. Build: cd packages/extension && node build.mjs

2. Open chrome://extensions in Chrome

3. Enable Developer Mode (top right toggle)

4. Click "Load unpacked" → select packages/extension/dist/

5. Set your Anthropic API key in the extension popup

Registry Location

All Ghost data is stored locally in ~/.ghost/. This includes the tool registry, auth sessions, monitoring data, download files, and debug logs. No data ever leaves your machine.

FAQ

Does Ghost require a cloud service or account?
No. Ghost runs 100% locally on your machine. The only external call is to the Anthropic API for tool generation (using your own API key). All tools, data, and sessions stay in ~/.ghost/ on your device.
Do I need the Chrome extension?
No. The Chrome extension is optional. Ghost works fully via Playwright (headless Chromium) without it. The extension is useful when you need visible browsing or want to use existing login sessions from your real Chrome profile.
What happens when a website changes its layout?
Ghost detects degraded selectors via quality scoring. When a tool's success rate drops below the threshold, Ghost automatically regenerates the tools for that page. You can also manually trigger regeneration with ghost_analyze().
How does Ghost compare to Computer Use (screenshot-based agents)?
Ghost is ~20,000x faster (436ms vs 45-60s), uses 97% fewer tokens (~200 vs 50,000+), costs $0 per action (vs ~$0.05), and requires just 1 tool call (vs 10-20). Ghost also caches tools for instant reuse, while screenshot agents start from scratch every time. When selectors break, Opus 4.6 with extended thinking auto-heals them.
Can Ghost handle authenticated/login-required pages?
Yes. Log into the site using the Ghost browser (via extension or Playwright), then run ghost_auth_save(domain: "example.com"). Ghost persists cookies and localStorage, so future sessions start already logged in.
What MCP clients are supported?
Ghost works with any MCP-compatible client: Claude Code, Codex, Roo Code, OpenCode, Cursor, Windsurf, Cline, and any other client that speaks the Model Context Protocol via stdio.
How do I see what tools Ghost has generated?
Run ghost_status() for a summary, or ghost_quality() for detailed per-tool health scores. The registry is also human-readable JSON in ~/.ghost/registry/.
Can I use Ghost for scraping?
Ghost extracts structured data from websites, but please use it responsibly. Respect robots.txt, rate limits, and terms of service. Ghost is designed for single-page extraction and monitoring, not large-scale crawling.
I get "ANTHROPIC_API_KEY not set" — what do I do?
Create a file at ~/ghost/.env with: ANTHROPIC_API_KEY=sk-ant-... (your key). Or export it in your shell. Then restart the MCP server. Run ghost_setup() to verify all checks pass.
ghost_go says "no page loaded" — what's wrong?
This usually means the Playwright browser hasn't started yet. Run ghost_navigate(url) first to open a page, or use ghost_go with a full instruction like "go to example.com and extract data". Ghost will navigate automatically.

Ready to get started?

Ghost is open source, MIT licensed, and free to use. Install it in 30 seconds and give your AI agent structured web tools today.