AgentBrowser is an open-source browser automation platform designed to be the hands and eyes of AI agents. Control real browsers via REST API, with Vision AI that lets LLMs see and understand web pages.
Chromium, Firefox, and WebKit via Playwright. Switch between engines per session with full configuration for viewport, proxy, user agent, locale, and timezone.
Screenshots, simplified DOM trees, accessibility trees, and interactive element detection. Everything an LLM needs to understand a page and decide what to do next.
8 clean JSON endpoints for session management, 25+ browser actions, vision snapshots, cookies, and action logs. Compatible with any LLM, CLI, or SDK.
Cookies, localStorage, and browser state are preserved across requests and sessions. Agents can pick up exactly where they left off.
Live events for actions, screenshots, and session changes via Socket.IO. Build reactive dashboards and responsive AI agent loops.
Professional dark-themed UI with session sidebar, live screenshot preview, action executor, Vision AI panel, and complete action log viewer.
Clone the repository and install dependencies. Playwright browsers are installed automatically.
# Clone the repository
git clone https://github.com/smouj/agent-browser.git
cd agent-browser
# Install dependencies & setup
bun install
bun run db:push
bunx playwright install chromium
Run the development server. The dashboard will be available at port 3000.
# Development mode (hot reload)
bun run dev
# Production mode
bun run build && bun run start
Create a session via the REST API or the web dashboard. Each session gets its own isolated browser context with persistent cookies and storage.
curl -X POST http://localhost:3000/api/browser/sessions \
-H "Content-Type: application/json" \
-d '{
"name": "my-agent-session",
"browserType": "chromium",
"headless": true
}'
Navigate to any URL, click elements, type text, scroll, take screenshots, and execute JavaScript — all via simple REST calls.
# Navigate to a page
curl -X POST http://localhost:3000/api/browser/sessions/{id}/action \
-H "Content-Type: application/json" \
-d '{"action":"navigate","target":"https://news.ycombinator.com"}'
# Get Vision AI snapshot
curl -X POST http://localhost:3000/api/browser/sessions/{id}/vision \
-H "Content-Type: application/json" \
-d '{}'
Define AgentBrowser as a tool in your LLM's function-calling schema. The agent can create sessions, navigate, observe pages via Vision AI, and execute actions autonomously. See the Integration Guide below for OpenClaw, Hermes, and custom agent examples.
OpenClaw is an open-source AI agent framework that uses tool-calling to interact with external services. AgentBrowser integrates as a set of browser control tools.
Step-by-step setup:bun run dev (default: http://localhost:3000)tools/browser.yaml):# tools/browser.yaml
name: browser_navigate
description: "Navigate the browser to a URL"
endpoint: "http://localhost:3000/api/browser/sessions/{session_id}/action"
method: POST
parameters:
session_id:
type: string
description: "Active browser session ID"
action:
type: string
default: "navigate"
target:
type: string
description: "URL to navigate to"
browser_click, browser_type, browser_vision, browser_screenshot, browser_scroll, etc.session_id to all tool calls/vision endpoint) to let the agent "see" the page before deciding what to doDELETE /sessions/{id}browser_create_session tool so the agent can manage its own browser lifecycle autonomously.
Hermes is a modular AI agent runtime that supports MCP (Model Context Protocol) servers. AgentBrowser can be exposed as an MCP server for native integration.
Step-by-step setup:bun run devhermes.config.yaml):# hermes.config.yaml
mcp_servers:
agentbrowser:
type: "rest"
base_url: "http://localhost:3000/api/browser"
tools:
- name: "create_session"
path: "/sessions"
method: "POST"
description: "Create a new browser session"
- name: "execute_action"
path: "/sessions/{session_id}/action"
method: "POST"
description: "Execute a browser action"
- name: "get_vision"
path: "/sessions/{session_id}/vision"
method: "POST"
description: "Get AI vision snapshot of page"
- name: "close_session"
path: "/sessions/{session_id}"
method: "DELETE"
description: "Close a browser session"
session_id to the next conversation.
Build your own AI agent with any LLM (OpenAI, Anthropic, Ollama, etc.) using the REST API. Here's a complete working example.
import requests
BASE = "http://localhost:3000/api/browser/sessions"
# 1. Create session
s = requests.post(BASE, json={
"name": "python-agent",
"browserType": "chromium",
"headless": True
}).json()
sid = s["id"]
# 2. Navigate
requests.post(f"{BASE}/{sid}/action", json={
"action": "navigate",
"target": "https://news.ycombinator.com"
})
# 3. Get vision (for LLM)
vision = requests.post(
f"{BASE}/{sid}/vision", json={}
).json()
# 4. Pass vision.interactiveElements
# to your LLM for decision-making
elements = vision["interactiveElements"]
for el in elements:
print(f"{el['type']}: {el['text']}")
# 5. Clean up
requests.delete(f"{BASE}/{sid}")
openai Python SDK to create a full autonomous agent loop: observe → think → act → repeat.
Define AgentBrowser as an OpenAI tool-calling function. Compatible with GPT-4o, GPT-4, GPT-3.5-turbo, and any model that supports function calling.
# Define tools for your OpenAI agent
tools = [{
"type": "function",
"function": {
"name": "browser_navigate",
"description": "Navigate browser to URL",
"parameters": {
"type": "object",
"properties": {
"url": {"type": "string"}
},
"required": ["url"]
}
}
}, {
"type": "function",
"function": {
"name": "browser_vision",
"description": "See the current page",
"parameters": {
"type": "object",
"properties": {
"full_page": {
"type": "boolean",
"default": false
}
}
}
}
}]
browser_click, browser_type, browser_screenshot, and browser_scroll tools for full page interaction.
POST /api/browser/sessions → Receive session_id
POST /api/browser/sessions/{id}/action with {"action":"navigate","target":"https://..."}
POST /api/browser/sessions/{id}/vision → Get screenshot, DOM, interactive elements
POST /api/browser/sessions/{id}/action with chosen action and selector
DELETE /api/browser/sessions/{id} → Close browser, free resources
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/browser/sessions |
Create a new browser session |
| GET | /api/browser/sessions |
List all sessions |
| GET | /api/browser/sessions/:id |
Get session details |
| DEL | /api/browser/sessions/:id |
Close and delete a session |
| POST | /api/browser/sessions/:id/action |
Execute a browser action (25+ available) |
| POST | /api/browser/sessions/:id/vision |
Get AI vision snapshot |
| GET / POST | /api/browser/sessions/:id/cookies |
Get or set cookies |
| GET | /api/browser/sessions/:id/logs |
Get paginated action logs |
navigate
goBack
goForward
reload
click
dblclick
hover
rightClick
type
press
select
scroll
wait
waitForSelector
waitForNavigation
screenshot
evaluate
getCookies
setCookies
clearCookies
getLocalStorage
setLocalStorage
clearLocalStorage
getUrl
getTitle
getContent
/api/browser/sessions/* · Input validation · Error handling
| Feature | AgentBrowser | Browser Use | Stagehand | Playwright MCP |
|---|---|---|---|---|
| Self-hosted | Yes | Yes | No (cloud) | Yes |
| REST API | 8 endpoints | No | SDK only | MCP only |
| Multi-browser | Chromium, Firefox, WebKit | Chromium only | Chromium only | All 3 |
| Vision AI | Screenshot, DOM, a11y, elements | Screenshot only | DOM only | None |
| Session persistence | Cookies, localStorage, DB | Memory only | No | No |
| Web Dashboard | Yes (dark theme) | No | No | No |
| WebSocket events | Yes | No | No | No |
| Browser actions | 25+ | Limited | Limited | Basic |
| Agent agnostic | Any LLM / CLI | Python only | JS / Python | Any MCP client |
| License | MIT | MIT | Commercial | MIT |
| Database | SQLite + Prisma | None | Cloud | None |