AgentBrowser - AI-Powered Browser Automation Platform

Built for AI Agents

Not a testing tool. A browser that AI agents can see, understand, and control through a clean REST API.

🌐

Multi-Browser Support

Chromium, Firefox, and WebKit via Playwright. Switch between engines per session with full configuration for viewport, proxy, user agent, locale, and timezone.

👁

Vision AI System

Screenshots, simplified DOM trees, accessibility trees, and interactive element detection. Everything an LLM needs to understand a page and decide what to do next.

🔗

REST API

8 clean JSON endpoints for session management, 25+ browser actions, vision snapshots, cookies, and action logs. Compatible with any LLM, CLI, or SDK.

🔒

Session Persistence

Cookies, localStorage, and browser state are preserved across requests and sessions. Agents can pick up exactly where they left off.

📡

WebSocket Real-Time

Live events for actions, screenshots, and session changes via Socket.IO. Build reactive dashboards and responsive AI agent loops.

💻

Web Dashboard

Professional dark-themed UI with session sidebar, live screenshot preview, action executor, Vision AI panel, and complete action log viewer.

See It In Action

A professional dashboard for monitoring and controlling browser sessions in real-time.

Live Browsing — Real browser sessions with live screenshots and navigation controls

Vision AI — Screenshot, interactive elements, accessibility tree, and page metadata for LLMs

25+ Actions — Click, type, scroll, wait, evaluate JS, manage cookies, and more

Web Dashboard — Professional dark-themed UI with session management sidebar

Action Logs — Every browser action recorded with timing and results

API Reference — Built-in endpoint documentation and curl examples

Quick Start Guide

Get AgentBrowser running in under 2 minutes. Zero configuration needed for local development.

Clone & Install

Clone the repository and install dependencies. Playwright browsers are installed automatically.

# Clone the repository
git clone https://github.com/smouj/agent-browser.git
cd agent-browser

# Install dependencies & setup
bun install
bun run db:push
bunx playwright install chromium

Start the Server

Run the development server. The dashboard will be available at port 3000.

# Development mode (hot reload)
bun run dev

# Production mode
bun run build && bun run start

Create a Browser Session

Create a session via the REST API or the web dashboard. Each session gets its own isolated browser context with persistent cookies and storage.

curl -X POST http://localhost:3000/api/browser/sessions \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-agent-session",
    "browserType": "chromium",
    "headless": true
  }'

Navigate & Interact

Navigate to any URL, click elements, type text, scroll, take screenshots, and execute JavaScript — all via simple REST calls.

# Navigate to a page
curl -X POST http://localhost:3000/api/browser/sessions/{id}/action \
  -H "Content-Type: application/json" \
  -d '{"action":"navigate","target":"https://news.ycombinator.com"}'

# Get Vision AI snapshot
curl -X POST http://localhost:3000/api/browser/sessions/{id}/vision \
  -H "Content-Type: application/json" \
  -d '{}'

Connect Your AI Agent

Define AgentBrowser as a tool in your LLM's function-calling schema. The agent can create sessions, navigate, observe pages via Vision AI, and execute actions autonomously. See the Integration Guide below for OpenClaw, Hermes, and custom agent examples.

AI Agent Integration Guide

Connect AgentBrowser to any AI agent framework. Here are step-by-step guides for the most popular platforms.

OpenClaw

OpenClaw is an open-source AI agent framework that uses tool-calling to interact with external services. AgentBrowser integrates as a set of browser control tools.

Step-by-step setup:

Start AgentBrowser: bun run dev (default: http://localhost:3000)
Create a tool definition file in your OpenClaw project (tools/browser.yaml):

# tools/browser.yaml
name: browser_navigate
description: "Navigate the browser to a URL"
endpoint: "http://localhost:3000/api/browser/sessions/{session_id}/action"
method: POST
parameters:
  session_id:
    type: string
    description: "Active browser session ID"
  action:
    type: string
    default: "navigate"
  target:
    type: string
    description: "URL to navigate to"

Register tools for all actions: browser_click, browser_type, browser_vision, browser_screenshot, browser_scroll, etc.
Create a session at agent startup and pass the session_id to all tool calls
Use the Vision AI tool (/vision endpoint) to let the agent "see" the page before deciding what to do
Close the session when the agent's task is complete via DELETE /sessions/{id}

💡 Tip: Define a browser_create_session tool so the agent can manage its own browser lifecycle autonomously.

Hermes

Hermes is a modular AI agent runtime that supports MCP (Model Context Protocol) servers. AgentBrowser can be exposed as an MCP server for native integration.

Step-by-step setup:

Install and start AgentBrowser: bun run dev
Add the MCP server configuration to your Hermes config file (hermes.config.yaml):

# hermes.config.yaml
mcp_servers:
  agentbrowser:
    type: "rest"
    base_url: "http://localhost:3000/api/browser"
    tools:
      - name: "create_session"
        path: "/sessions"
        method: "POST"
        description: "Create a new browser session"
      - name: "execute_action"
        path: "/sessions/{session_id}/action"
        method: "POST"
        description: "Execute a browser action"
      - name: "get_vision"
        path: "/sessions/{session_id}/vision"
        method: "POST"
        description: "Get AI vision snapshot of page"
      - name: "close_session"
        path: "/sessions/{session_id}"
        method: "DELETE"
        description: "Close a browser session"

Restart Hermes to load the new MCP server
Ask Hermes to browse the web: "Go to Hacker News and find the top 5 stories about AI"
Hermes will: Create a session → Navigate → Get vision snapshot → Read interactive elements → Extract data → Close session

💡 Tip: For persistent sessions across Hermes conversations, don't close the session — pass the session_id to the next conversation.

Custom Python Agent

Build your own AI agent with any LLM (OpenAI, Anthropic, Ollama, etc.) using the REST API. Here's a complete working example.

import requests

BASE = "http://localhost:3000/api/browser/sessions"

# 1. Create session
s = requests.post(BASE, json={
    "name": "python-agent",
    "browserType": "chromium",
    "headless": True
}).json()
sid = s["id"]

# 2. Navigate
requests.post(f"{BASE}/{sid}/action", json={
    "action": "navigate",
    "target": "https://news.ycombinator.com"
})

# 3. Get vision (for LLM)
vision = requests.post(
    f"{BASE}/{sid}/vision", json={}
).json()

# 4. Pass vision.interactiveElements
#    to your LLM for decision-making
elements = vision["interactiveElements"]
for el in elements:
    print(f"{el['type']}: {el['text']}")

# 5. Clean up
requests.delete(f"{BASE}/{sid}")

💡 Tip: Combine with openai Python SDK to create a full autonomous agent loop: observe → think → act → repeat.

OpenAI Function Calling

Define AgentBrowser as an OpenAI tool-calling function. Compatible with GPT-4o, GPT-4, GPT-3.5-turbo, and any model that supports function calling.

# Define tools for your OpenAI agent
tools = [{
    "type": "function",
    "function": {
        "name": "browser_navigate",
        "description": "Navigate browser to URL",
        "parameters": {
            "type": "object",
            "properties": {
                "url": {"type": "string"}
            },
            "required": ["url"]
        }
    }
}, {
    "type": "function",
    "function": {
        "name": "browser_vision",
        "description": "See the current page",
        "parameters": {
            "type": "object",
            "properties": {
                "full_page": {
                    "type": "boolean",
                    "default": false
                }
            }
        }
    }
}]

💡 Tip: Add browser_click, browser_type, browser_screenshot, and browser_scroll tools for full page interaction.

Typical Agent Workflow

This is the observe-think-act loop that AI agents follow when using AgentBrowser.

① CREATE SESSION

POST /api/browser/sessions → Receive session_id

↓

② NAVIGATE

POST /api/browser/sessions/{id}/action with {"action":"navigate","target":"https://..."}

↓

③ OBSERVE (Vision AI)

POST /api/browser/sessions/{id}/vision → Get screenshot, DOM, interactive elements

↓

④ THINK (LLM Decision)

LLM analyzes vision data → Decides next action (click, type, scroll, etc.)

↓

⑤ ACT

POST /api/browser/sessions/{id}/action with chosen action and selector

↻ Repeat ③-⑤ until task complete

↓

⑥ CLEANUP

DELETE /api/browser/sessions/{id} → Close browser, free resources

REST API Reference

8 endpoints covering session management, browser actions, vision, cookies, and logging.

Endpoints

Method	Endpoint	Description
POST	`/api/browser/sessions`	Create a new browser session
GET	`/api/browser/sessions`	List all sessions
GET	`/api/browser/sessions/:id`	Get session details
DEL	`/api/browser/sessions/:id`	Close and delete a session
POST	`/api/browser/sessions/:id/action`	Execute a browser action (25+ available)
POST	`/api/browser/sessions/:id/vision`	Get AI vision snapshot
GET / POST	`/api/browser/sessions/:id/cookies`	Get or set cookies
GET	`/api/browser/sessions/:id/logs`	Get paginated action logs

Supported Browser Actions

navigate goBack goForward reload click dblclick hover rightClick type press select scroll wait waitForSelector waitForNavigation screenshot evaluate getCookies setCookies clearCookies getLocalStorage setLocalStorage clearLocalStorage getUrl getTitle getContent

Architecture

Clean layered design separating the API layer from the browser engine, with event-driven communication.

Clients Layer

LLM Agents · CLIs · Web Dashboard · Custom Integrations

↓ REST API ↓ WebSocket

Next.js API Routes

/api/browser/sessions/* · Input validation · Error handling

↓

Browser Engine Layer

Session Manager · Action Executor (25+ actions) · Vision System (Screenshots, DOM, A11y Tree)

↓

Playwright

Chromium · Firefox · WebKit

↓

Persistence

SQLite (via Prisma) · Sessions · Action Logs

Tech Stack

Modern, production-ready technologies chosen for reliability, performance, and developer experience.

Next.js 16

Full-stack framework

TypeScript

End-to-end type safety

Playwright

Browser automation engine

Prisma

Type-safe ORM (SQLite)

Tailwind CSS 4

Utility-first styling

shadcn/ui

UI component library

Socket.IO

Real-time WebSocket

Zustand

State management

How AgentBrowser Compares

AgentBrowser is the most complete open-source solution for AI browser automation. Here's how it stacks up.

Feature Comparison

Feature	AgentBrowser	Browser Use	Stagehand	Playwright MCP
Self-hosted	Yes	Yes	No (cloud)	Yes
REST API	8 endpoints	No	SDK only	MCP only
Multi-browser	Chromium, Firefox, WebKit	Chromium only	Chromium only	All 3
Vision AI	Screenshot, DOM, a11y, elements	Screenshot only	DOM only	None
Session persistence	Cookies, localStorage, DB	Memory only	No	No
Web Dashboard	Yes (dark theme)	No	No	No
WebSocket events	Yes	No	No	No
Browser actions	25+	Limited	Limited	Basic
Agent agnostic	Any LLM / CLI	Python only	JS / Python	Any MCP client
License	MIT	MIT	Commercial	MIT
Database	SQLite + Prisma	None	Cloud	None

Why AgentBrowser?
AgentBrowser is the only open-source solution that combines a full REST API, Vision AI system, session persistence, WebSocket real-time events, AND a web dashboard — making it the most complete and production-ready platform for AI browser automation. Whether you're using OpenClaw, Hermes, OpenAI, Anthropic, or a custom agent framework, AgentBrowser gives your AI agent real eyes and hands on the web.

Other Notable Projects

LaVague (lavague.ai)
RAG-based approach, research-focused. No REST API, no session management.

Skyvern (skyvern.com)
Commercial cloud platform. No self-hosting, workflow-only approach.

WebVoyager (academic)
Research paper, not production-ready. Limited action set.

Give any AI agent a real browser