Browser Automation with OpenClaw: A Practical Guide to AI-Driven Web Control

One of OpenClaw's most powerful features is browser control. Your AI agents can interact with websites like a human would - clicking buttons, filling forms, navigating pages, and extracting data. This isn't just theoretical. I use it daily for tasks that would otherwise require manual browser work.

Let me show you how it works and what you can build with it.

Why Browser Automation Matters

Browser automation solves a specific problem: many tasks require interacting with web interfaces that don't have APIs. Checking prices on a website, filling out forms, monitoring changes, running tests - these all need a browser. See also: HiClaw: Run a Multi-Agent AI Team Locally with Ollama.

Before OpenClaw, you had two options:

Selenium or Playwright scripts (requires coding every step)
Manual work (slow and repetitive)

OpenClaw adds a third option: AI agents that understand what you want and figure out the browser interactions themselves. You describe the goal, the agent navigates the page.

Real example from my workflow: I have an agent that monitors competitor blog posts. It browses to their site, finds new articles, extracts key points, and logs them to my Notion database. No API needed, no manual checking.

Need help with AI integration?

Get in touch for a consultation on implementing AI tools in your business.

Contact Me

How OpenClaw Browser Control Works

OpenClaw uses Playwright under the hood, but you don't write Playwright code. Instead, you use the browser tool in natural language or through structured commands. See also: Full Browser Control with OpenClaw and Chrome DevTools MCP.

The agent can:

Open URLs and navigate pages
Take screenshots and snapshots (for AI vision analysis)
Click elements, fill forms, press keys
Extract content and verify results
Handle multiple tabs and browser profiles
Run JavaScript in the page context

It supports two browser profiles:

openclaw - isolated browser managed by OpenClaw
chrome - connect to your actual Chrome browser via extension (OpenClaw Browser Relay)

The second option is powerful: your agent can work in Chrome tabs you already have open, using your logged-in sessions and cookies.

Basic Example: Web Scraping a Blog

Let me walk through a practical example. I'll show the agent commands and what they do.

Goal: Scrape the latest blog post title and summary from a website.

First, open the page:

// Agent uses browser tool
browser({
  action: "open",
  url: "https://example.com/blog",
  profile: "openclaw"
})

Then take a snapshot to understand the page structure:

browser({
  action: "snapshot",
  targetId: "<from previous step>"
})

The snapshot returns a text representation of the page with reference IDs for each element. The AI reads this to understand the layout.

Click on the first blog post link:

browser({
  action: "act",
  targetId: "<tab id>",
  request: {
    kind: "click",
    ref: "e42"  // reference from snapshot
  }
})

Extract the title and summary from the new page:

browser({
  action: "snapshot",
  targetId: "<tab id>"
})

The agent parses the snapshot and extracts the relevant text. No CSS selectors, no brittle XPath - the AI figures out which text matters based on context.

Discover my projects

Check out the projects I'm working on and the technologies I use.

View Projects

Advanced Pattern: Form Automation

Forms are everywhere. Contact forms, login pages, search interfaces. Here's how to handle them with OpenClaw.

Example: Submit a search query and extract results.

Navigate to the site and take a snapshot:

browser({ action: "open", url: "https://search-site.com" })
browser({ action: "snapshot" })

Fill the search field and submit:

browser({
  action: "act",
  request: {
    kind: "fill",
    ref: "e15",  // search input field
    text: "OpenClaw tutorials"
  }
})

browser({
  action: "act",
  request: {
    kind: "click",
    ref: "e16",  // search button
    submit: true
  }
})

Wait for results to load and extract them:

browser({
  action: "act",
  request: {
    kind: "wait",
    text: "Results for"  // wait for specific text to appear
  }
})

browser({ action: "snapshot" })

The agent reads the results from the snapshot and processes them as needed.

Important detail: The submit: true flag tells the agent to wait for page navigation after clicking. This prevents race conditions where you try to read results before they load.

Using Chrome Extension Relay

The Chrome extension relay is brilliant for tasks that need authentication or specific browser state.

Install the OpenClaw Browser Relay extension in Chrome. When enabled on a tab, that tab becomes available to your agents.

Click the extension icon on a tab to attach it. The badge shows "ON" when active.

Then in your agent:

browser({
  action: "open",
  profile: "chrome",  // use Chrome instead of isolated browser
  url: "https://app.example.com/dashboard"
})

The agent works in your actual Chrome tab, using your logged-in session. This is perfect for:

Internal tools that require login
Sites with complex auth flows
Working with data in your personal accounts

I use this for automating tasks in web apps I'm already logged into - no need to handle authentication in the agent.

Want AI integration in your business?

Contact me for a consultation on implementing AI tools in your company.

Get in Touch

Real-World Use Cases

Here's what I've built and what I've seen others do:

Content monitoring: Agents that check specific websites for updates. When they find new content, they extract key points and send notifications. No API needed.

Price tracking: Monitor e-commerce sites for price changes. The agent visits product pages, extracts current prices, compares to historical data, alerts on drops.

Form submission automation: Bulk submissions to contact forms or application portals. The agent reads a list of entries, navigates to the form, fills it accurately, submits, verifies success.

UI testing: Agents that click through workflows to verify functionality. They can report which steps failed and include screenshots of errors.

Data extraction: Pull structured data from sites that don't offer downloads. The agent navigates paginated results, extracts each entry, compiles into a dataset.

Meeting scheduler: An agent that opens your calendar app, checks availability, and books slots based on incoming requests. Works with any web-based calendar, not just those with APIs.

The pattern is consistent: anything you can do manually in a browser, you can teach an agent to do. And once taught, it runs reliably without supervision.

Best Practices

From experience building these workflows, here's what matters:

Use snapshots liberally. Don't assume page structure. Take a snapshot, let the AI read it, then act. This handles dynamic content and layout changes.

Wait for state changes. After clicking buttons or submitting forms, use wait actions to ensure the page has updated before proceeding. Text-based waits (text: "Loading complete") are more reliable than fixed timeouts.

Verify actions. After important steps (form submission, data entry), take another snapshot and check that the expected result appears. Catch errors early.

Handle failures gracefully. Web pages change. Build your agents to detect when expected elements are missing and surface errors clearly. Better to fail with a useful message than continue with stale assumptions.

Respect rate limits. If you're scraping or making many requests, add delays between actions. Use delayMs in your requests to avoid overwhelming sites or triggering bot detection.

Keep sessions separate. Use the isolated openclaw profile for automation scripts. Reserve chrome profile for tasks that genuinely need your logged-in state.

Common Pitfalls

Brittle selectors: Don't rely on CSS classes or IDs in your instructions. They change frequently. Instead, describe elements in natural language ("the blue Submit button", "the email input field") and let the AI find them in the snapshot.

Not waiting enough: If your agent clicks too fast, it might try to interact with elements before they're ready. Add waits after navigation and dynamic content updates.

Ignoring errors: Check that actions succeeded. After a click, verify the page changed as expected. After a form submission, verify the success message appeared.

Over-automation: Not everything needs browser automation. If an API exists, use that instead. Browser control is for when there's no alternative.

Explore my work

See the projects I've built and the technologies I work with.

View Projects

Getting Started

To use browser automation in your OpenClaw setup:

Ensure Playwright is installed (OpenClaw handles this during setup)
Start the browser control server: openclaw browser start
In your agent code or chat, use the browser tool with appropriate actions
For Chrome extension relay, install OpenClaw Browser Relay from the Chrome Web Store

Start simple. Pick one repetitive browser task you do manually. Write an agent to do it. Test thoroughly. Then expand.

What's Next

Browser automation is one piece of OpenClaw's toolkit. Combine it with other capabilities:

Use web_fetch for simple content extraction without browser overhead
Chain browser actions with file operations to save extracted data
Integrate with APIs for tasks that have programmatic access
Use memory skills to track state across multiple browser sessions

The power comes from combining tools. An agent that monitors a website, extracts data, processes it with AI analysis, and posts results to Notion - that's possible, and it's all in natural language instructions.

Closing Thoughts

Browser automation with OpenClaw isn't about replacing developers with AI. It's about automating the tedious browser work that wastes developer time.

I still write code. I still use APIs when they exist. But for the tasks that need a browser - price checks, form fills, UI tests, content monitoring - I let agents handle it. They don't complain about repetitive work.

If you have browser tasks you do manually, try automating one with OpenClaw. Start with something small. A single form fill, a simple scrape. See how it works. Then expand.

The browser is just another tool in the agent's toolkit. But it's a powerful one.