Klaus
· by Robbie

OpenClaw Browser Automation: What Works, What Doesn't, and When to Use It

OpenClaw Browser Automation: What Works, What Doesn’t, and When to Use It

I wrote a shorter post a while back about when browser automation will work. Since then, the patterns are clearer. Some tasks work reliably. Others waste hours before you realize the approach was wrong from the start.

The most common question I get is some version of “can my agent do X on this website?” The answer is almost always “it depends on the site, the task, and which browser mode you use.” This goes deeper into all three: how OpenClaw actually controls browsers, which mode to use for what, where things break, and when to skip the browser entirely.

How OpenClaw Controls Your Browser

OpenClaw uses the Chrome DevTools Protocol (CDP) to control Chromium-based browsers. The same protocol that powers Chrome’s built-in DevTools. There are three distinct modes, each designed for a different situation.

Managed Browser

OpenClaw launches a separate browser profile, isolated from your personal browser, with dedicated CDP ports in the 18800-18899 range (OpenClaw Docs). Fresh session every time. Nothing from your personal browsing leaks in.

This is the default for automation tasks that don’t need your login sessions. Scraping public sites, taking screenshots, submitting forms on pages that don’t require authentication. It requires Playwright for the full feature set: navigation, AI snapshots, and PDF export all depend on it. Without Playwright installed, those features return a 501 error (OpenClaw Docs).

Extension Relay (Browser Relay)

This is the most interesting mode. The Browser Relay extension lets your agent control your real Chrome tabs with your logged-in sessions intact. It uses Chrome’s debugger API to attach to tabs, preserving session cookies and authentication state.

Install it with openclaw browser extension install, then load the unpacked extension in chrome://extensions with Developer mode enabled. Pin it to your toolbar. Click the icon on any tab to activate it.

This mode is the reason I can tell my agent “go update the billing settings in Google Cloud Console” and it actually works. The agent sees the same authenticated page I see. On Klaus, Browser Relay is pre-configured, so there’s no wiring to do.

The trade-offs: Relay uses snapshot references instead of CSS selectors, doesn’t support slowly=true typing, and can’t do batch actions or PDF export (OpenClaw Docs). It also only works with Chromium-based browsers: Chrome, Brave, Edge, or Chromium.

Remote CDP

Connect to a Chromium instance running on another machine via WebSocket or HTTP endpoint (OpenClaw Docs). This works with cloud browser services like Browserbase and Browserless.

Use this when your OpenClaw instance runs on a server without a display, or when you need to scale beyond one browser. The OpenClaw instance sends CDP commands over the network to a browser running elsewhere. This means you can run browser automation from a headless Linux VM, which is how most Klaus instances are set up.

For production, use encrypted endpoints (HTTPS/WSS) with short-lived tokens. OpenClaw supports both HTTP discovery (/json/version) and direct WebSocket connections (OpenClaw Docs). The timeout defaults are 1,500ms for remote CDP and 3,000ms for the handshake, which you can adjust if your cloud browser service is slower to respond.

Which Mode to Use

ModeNeeds your login?Runs headless?Full Playwright features?Best for
ManagedNoYesYesIsolated automation, scraping, screenshots
Extension RelayYesNoPartialLogged-in dashboards, authenticated forms
Remote CDPDepends on serviceYesYesCloud/distributed, headless servers

What Browser Automation Actually Handles Well

The agent runs a snapshot to identify every interactive element on the page, assigns each one a reference number, and then acts on them based on natural language instructions (OpenClaw Docs). You describe what you want in plain language (“click the Settings gear icon, then select Billing, then update the payment method”) and the agent handles the rest.

Complex UI navigation is where browser automation shines. Dashboards with nested menus, settings pages buried five clicks deep, multi-step wizards where the next step depends on what you clicked in the previous one. Bailey and I spent a lot of time letting our agent Bittie modify Google Cloud settings and update company information in Rippling (existing post). These are interfaces designed for humans clicking through GUIs, and many of them have no API alternative. The browser is the only way in.

The common thread: these tasks involve navigating a known interface to reach a specific destination. The agent reads the page, identifies the right elements, clicks through. It’s the web equivalent of giving someone directions.

Simple form submission works reliably on contact forms, basic data entry, and survey pages where fields don’t fight back with complex validation. The agent identifies fields by their snapshot reference numbers, types values, and clicks submit. Registration forms, feedback surveys, and simple order forms all tend to work on the first attempt.

Screenshot and page capture is straightforward. Grab the visual state of a page for reporting, documentation, or progress tracking. Managed mode handles this cleanly because it runs in an isolated profile.

Canvas generation is a related capability: OpenClaw creates HTML files (visualizations, dashboards, mini-apps) and renders them in the browser. This is less “automation” in the traditional sense and more about the browser as an output surface. Your agent builds something, then you see it rendered.

Where Browser Automation Falls Over

Every tool comparison lists what browser agents can do. Here’s what fails in practice.

Bot Detection Gets You Before You Start

Sites with sophisticated bot detection analyze your browser fingerprint and IP address (existing post). CDP itself is a detection signal. Anti-bot vendors like DataDome detect the underlying automation protocol, not specific framework artifacts. This means it catches unknown automation frameworks too, including the ones trying to stay under the radar with anti-detect features.

LinkedIn, most social media platforms, and many e-commerce sites actively block automated browsers. No stealth plugin fully solves this. The detection arms race moves faster than evasion techniques. Modern detection uses a multi-layered approach: client-side behavior analysis (mouse movement patterns, scroll speed, typing cadence), red pill tests to fingerprint the environment (VMs, headless browsers, fake hardware), and multiple signal stacking. Sites that don’t block outright sometimes shadowban automated sessions: no hard error, just silently degraded results.

Reactive Forms Break Multi-Step Workflows

Forms with real-time validation, auto-complete suggestions, and complex field dependencies are unreliable targets. I spent too many hours of my life getting our agent to fill out expense reports in Navan. The problem was fancy validation on the exact keystrokes used to enter dollar amounts (existing post).

Auto-complete fields are another common failure. The agent types a value, the dropdown appears, and the agent doesn’t know to wait for it before pressing Enter. Or it selects the wrong suggestion. Or the dropdown renders in an iframe that the agent can’t see at all. Date pickers, address auto-complete with Google Maps, and cascading dropdowns (where selecting a country changes the state list) all fall into this category. These are interactions designed around precise human timing and visual feedback loops that agents handle inconsistently.

Reliability Compounds Across Steps

The best browser agents achieve around 89% success on benchmarks like WebVoyager (Browser Use Technical Report). That sounds reasonable for a single action. But in a 20-step workflow, 89% per step means about a 10% chance of completing the whole thing — nine out of ten runs fail somewhere along the way. Performance also varies dramatically by site: 100% on Huggingface, 80% on Booking.com (Browser Use Technical Report). The sites where you most want automation (complex, dynamic, behind authentication) tend to be the ones with the lowest success rates.

The underlying problem is that each step in a browser workflow might succeed only a certain percentage of the time. Multiply those probabilities together across an entire task, and the system doesn’t work at all in practice (Amazon Science). Even a single click is more complex than it appears. It actually triggers multiple events: hover, pointer move, mousedown, mouseup, click. Modern web frameworks expect this complete chain in the correct sequence. If any event is missing or misordered, the click doesn’t register. Shadow DOMs and iframes add more fragility by hiding elements from the agent’s view (Amazon Science).

This is not a temporary limitation waiting for better models. It’s structural. The web was built for humans with mice and keyboards, not for agents issuing CDP commands. Improvements will come, but the compounding math means long workflows will remain fragile for a while.

So design agent tasks with as few browser steps as possible. A 3-step workflow (open page, fill one field, click submit) can work well. A 15-step workflow through a complex form with conditional logic and validation is fighting the math. Break long tasks into shorter chunks with verification between each one.

When to Skip the Browser Entirely

The best browser automation advice is often: don’t use a browser.

APIs first. If the service has an API, use it. APIs are deterministic, fast, and don’t care about your browser fingerprint. Every time a customer asks me about automating something, my first question is whether the service has an API or a webhook. Most SaaS tools do. Google Workspace has APIs for everything. Slack, HubSpot, Notion, Airtable all have well-documented endpoints. Your agent can call these directly without touching a browser.

Scraping APIs for data extraction. If you need to pull data from a website, Firecrawl, Olostep, and Exa handle JavaScript rendering, anti-bot detection, and structured extraction without running a browser on your instance. They do the hard parts on their infrastructure and return clean data. On Klaus, these are pre-configured via Orthogonal, so there’s no API key setup.

web_fetch for static content. OpenClaw’s built-in web_fetch tool does an HTTP GET and returns markdown. No JavaScript execution, but it’s fast and free for static sites: documentation, government pages, blog posts, press releases (OpenClaw Docs).

The hierarchy is clear: API > scraping service > web_fetch > browser automation. Each step up adds complexity, cost, and failure modes. Only reach for the browser when none of the others work, which typically means: the task involves interacting with a GUI that has no API, and the site doesn’t aggressively block bots.

For a deeper look at the full decision tree for data extraction, the web scraping guide covers when to use each approach.

Common Setup Issues and How to Fix Them

These are the issues I see most often across Klaus instances.

“openclaw browser” is an unknown command. The most common cause is a restrictive plugins.allow list that doesn’t include browser (OpenClaw Docs). Add browser to your plugin allow list and restart the gateway.

CDP connection fails. OpenClaw defaults to a fail-closed SSRF policy, even when you don’t explicitly configure one (OpenClaw Docs). This is a security default, not a bug. If the browser launches but can’t navigate, check whether the SSRF policy is blocking the target URL. CDP reachability issues (the browser won’t start) are a different problem from SSRF navigation blocks (the browser starts but can’t go anywhere).

Features return 501 errors. Navigation, AI snapshots, and PDF export require Playwright. Without it, these features return a clear 501 error (OpenClaw Docs). Install Playwright to unlock the full feature set.

Extension Relay won’t connect. Ensure the Gateway is running on the machine where OpenClaw is installed. If the Gateway is on a remote machine (like a Klaus instance), you need a node host running on your local machine to bridge the connection between your Chrome and the remote Gateway.

Browser works but pages load blank or incomplete. Some sites rely heavily on client-side rendering. The agent takes a snapshot before the page finishes loading and sees nothing useful. Adding a brief wait before the snapshot helps. In Managed mode, Playwright’s built-in waitForLoadState handles this automatically for most pages. In Relay mode, you may need to tell the agent explicitly to wait a moment before reading the page.

Frequently Asked Questions

Can OpenClaw fill out forms on websites?

Yes, for simple forms. The agent uses snapshot references to identify fields and can type, click, and select values (OpenClaw Docs). Forms with complex validation, auto-complete, or reactive UI elements are unreliable. Static contact forms and basic data entry work well. Multi-step forms with conditional logic often don’t.

Does Browser Relay work with Firefox or Safari?

No. Browser Relay requires a Chromium-based browser: Chrome, Brave, Edge, or Chromium (Chrome Web Store). Firefox and Safari don’t support the Chrome debugger API that the extension uses. There’s no workaround for this.

Can I run browser automation on a headless server?

Yes, using Managed mode or Remote CDP. Managed mode launches a Chromium instance with Playwright, which supports headless operation out of the box (OpenClaw Docs). Extension Relay requires a visible browser with a user present, so it won’t work headless.

How does OpenClaw handle CAPTCHAs?

It doesn’t. If a site serves a CAPTCHA, the automation stops. This is one reason to prefer APIs or scraping services for sites that aggressively gate automated access. Scraping APIs like Firecrawl handle CAPTCHA challenges on their infrastructure, so your agent never sees them.

Key Takeaways

  • OpenClaw offers three browser modes: Managed (isolated automation), Extension Relay (logged-in sessions), and Remote CDP (cloud/distributed). Pick the mode that matches your task before starting.
  • Extension Relay is the standout feature for authenticated work. Your agent controls your real browser with your sessions intact. On Klaus, it’s pre-configured.
  • Browser automation works well for complex UI navigation, simple forms, and screenshots. It fails on bot-protected sites, reactive forms, and long multi-step workflows.
  • Reliability compounds: if each step fails 10% of the time, a 10-step workflow fails more often than it succeeds. Keep browser workflows short.
  • APIs, scraping services, and web_fetch are better than browser automation for most data extraction tasks. Use the browser as a last resort, not the default.
  • Common setup issues (plugin not enabled, SSRF blocking, missing Playwright) have straightforward fixes. Check the OpenClaw browser docs for configuration reference.
  • The hierarchy for web tasks: API > scraping service > web_fetch > browser automation. Only reach for the browser when nothing else works.

Want to try browser automation without the setup? Klaus ships with all three modes pre-configured. Sign up at klausai.com.

Sources