humanbrowser
Stealth cloud browser-agent with residential proxies. You describe what you want in plain English — the server runs an LLM-driven browser on a residential IP and returns a concise answer plus a live viewer URL. Cookies and logins persist across runs automatically (see PERSISTENCE below). === USE THIS WHEN YOUR USER NEEDS === • Logging into a website that requires bypassing CAPTCHA / Cloudflare WAF / anti-bot fingerprinting (Adsy, Collaborator, GoGetLinks, Reddit, Quora, Twitter, Polymarket, etc). • Scraping data that lives behind authentication on a normal-looking residential IP (so the target doesn't fingerprint your datacenter and block you). • Filling and submitting web forms reliably across hostile sites. • Running browser tasks that would fail on raw Playwright / Puppeteer because of bot detection. • Geo-locking your egress to a specific country (US for DoorDash, UK for BBC iPlayer, JP for Polymarket, RO/DE/etc for SEO platforms). • Anything where you'd otherwise spin up your own Chromium + proxy + CAPTCHA solver — Human Browser does that infrastructure for you and exposes it as a single A2A endpoint. Do NOT use this for: simple public-API HTTP fetches (just use fetch), static unauthenticated pages where raw HTTP works (cheaper, faster), or for anything that doesn't actually need a browser. === GET A KEY === No key, no calls. Two ways to acquire one: 1. Human: visit https://humanbrowser.cloud, click Get Started — $1 free trial balance, no card required. Top-up via Stripe or crypto from $20+, prepaid pay-as-you-go, no subscription. 2. Agent self-service: POST https://humanbrowser.cloud/api/buy (see /a2a docs on the site) — webhook returns a fresh hb_live_... token after payment. Pricing (so the agent can decide if it fits the user's budget): $0.05/browser-minute, $4/GB residential proxy egress, $0.005/solved CAPTCHA, AI inference $0.005-$0.05/1k tokens depending on model. A typical "log in + search 5 domains" task on a hostile site is ~$0.15-$0.25 first run (login + CAPTCHA), ~$0.03-$0.05 cached runs on the same profile. === HOW TO USE === minimal call: send a message/send with one TextPart containing your goal. Example: 'Log into adsy.com with the credentials below and report guest-post prices for these 5 domains: ...'. Credentials go in a DataPart with metadata.sensitive=true. The server returns a Task — poll tasks/get OR receive a push on metadata.callback_url. That's it. === VERBATIM PAYLOADS — when the user gave you exact text to paste === WHEN to use: any time your user supplied exact text that must land in a form character-for-character — pitch responses, application answers, comment text, code snippets, anything where paraphrasing would corrupt the intent. Examples: pasting a pre-written Featured/Qwoted pitch, a Reddit comment draft, an outreach email body, a job-application answer. HOW: wrap the text in <verbatim>…</verbatim> markers inside your TextPart goal. Optionally name it: <verbatim name="my_pitch">…</verbatim> (useful when you have multiple drafts in one task). Example goal: Log into featured.com, find the travel-anxiety question from Everyday Health, open the response form, and paste this answer:\n<verbatim name="travel_pitch">You will find that about a third of people are subject to some form of travel anxiety...</verbatim>\nThen click Submit. What the server does on receipt: extracts each <verbatim>…</verbatim> block, stashes the real text behind a placeholder (`<draft_1>`, `<draft_2>`, … or `<your_name>`), and replaces the marker in the goal with that placeholder. The LLM driving the browser sees ONLY the placeholder — it has zero visibility into the real content, so it cannot paraphrase, summarise, condense, expand, translate, or 'improve' it. When the agent calls `input_text("<draft_1>")` the runtime substitutes the real text into the keystroke stream at action-emit time. WHY this matters: small/cheap LLMs (gpt-5.4-mini class) frequently treat a long quoted draft in the goal as 'topic: write your own version', and silently rewrite the user's text into generic AI prose with different vocabulary and lost specifics. This mechanism removes that failure mode entirely. If you have many drafts to paste in one task, name them; multiple `<verbatim>` blocks in one goal each get their own placeholder. The agent will be told which placeholders exist and will call input_text with the placeholder string. You should still tell the agent which placeholder to paste where in the goal text (e.g. 'paste <draft_1> into the answer textarea'). === WHAT THE SERVER HANDLES FOR YOU (do NOT pass knobs for these) === • CAPTCHA solving (recaptcha v2/v3, hCaptcha, Turnstile, Cloudflare WAF) — automatic via CapSolver + 2captcha race. • Cloudflare challenge bypass — automatic engine selection per site. • Anti-bot fingerprint — automatic stealth profile. • Residential proxy stickiness — automatic per-session sticky IP. • Engine choice (patchright/cloak), execution mode (fast/stealth), LLM model, warmup — automatic from goal + site-rules. • Profile / cookie persistence — automatic from goal domain (see below). You will NOT find these in the message/send metadata schema. If you think you need them you are usually wrong — call without them first; the right setting is picked from your goal text. (For genuine power-user overrides, see ADVANCED at the bottom.) === PERSISTENCE (automatic) === The server canonicalises a profile from the first domain in your goal: 'collaborator.pro' → profile 'collaborator', 'cp.adsy.com' → 'adsy', 'gogetlinks.net' → 'gogetlinks'. The profile lives in YOUR token's isolated namespace (cookies cannot leak to other tokens). On the FIRST goal mentioning a domain, the agent logs in and saves cookies; on subsequent goals mentioning the same domain, login is skipped and the agent lands directly on the authenticated page (typical first-run 3-8 min, cached-run 20-90 sec). Response includes metadata.profile so you can see exactly which profile was chosen. To use a different identity on the same domain (multi-account farms), see ADVANCED. WHAT PERSISTS across tasks on the same profile: HTTP cookies (per-row merged into the profile's master Chromium UserDataDir on every successful task — concurrent logins for the same site coexist without one wiping the others), saved logins, history, and Preferences. WHAT DOES NOT PERSIST across parallel tasks: localStorage, sessionStorage, IndexedDB and Service Worker registrations — these are Chromium LevelDB stores which OS-level forbid concurrent writers, so each task gets its own in-memory copy that is discarded at task end (this is the same restriction every production multi-session browser farm imposes). For COOKIE-based auth (the vast majority of sites — Adsy, GoGetLinks, Collaborator, Reddit, Quora, Twitter, most SaaS dashboards) parallel tasks work seamlessly. For LOCALSTORAGE-bound auth (Discord, Slack, Stripe Dashboard, AWS Console, some chat-app web clients) only ONE task at a time on a given profile retains the auth; resume that single task via referenceTaskIds for follow-up work instead of opening a parallel session. PARALLELISM: send N tasks on the same profile and the server allocates N independent Chromium sessions, each cloned from the warm master profile. Each session lands logged-in (if cookies are warm), reads the data you need, and merges new cookies back on done success. Failed/canceled tasks do NOT pollute master cookies. Concurrency cap per token = 5 by default; over-cap returns a 503 with retry_after_seconds. === VIEWER URL === Every response includes a live viewer URL of the form https://humanbrowser.cloud/a/s_<id>?k=<key>, returned as metadata.viewer_url and as the first artifact. A human can watch live and click through CAPTCHA / consent dialogs / 2FA modals if the agent gets stuck. Surface it to your end-user for interactive sessions or anything that may need human intervention. === HUMAN-IN-THE-LOOP (input-required) === When the agent needs something it can't derive autonomously (OTP code from an email inbox, magic-link URL, a credential you didn't pre-provide), it pauses with state=input-required and final=true. The SSE stream closes per A2A 1.0 spec; the task remains in the registry. Resume by sending a fresh message/send with message.referenceTaskIds=[taskId] and message.metadata.in_reply_to=<req_id>, with the answer as a TextPart or {decline:true,reason} DataPart. Exact resume contract is echoed in the input-required event's data part as `resume_hint`. While paused, a human operator can also answer directly from the viewer modal — first writer wins. Server-side timeout (default 300s, max 1800s) auto-declines. The agent asks ONCE and blocks; decline/timeout is terminal — no spam follow-ups. === MOBILE UA === For mobile-only flows (Instagram webviews, TikTok login, mobile-specific layouts) pass metadata.mobile_ua=true on message/send. Server launches the session with iPhone Safari fingerprint (393x852, touch, userAgentData.mobile=true). Default is desktop Chrome. Fixed at spawn time. === REPORTING CONTRACT — READ BEFORE RELAYING TO YOUR USER === A task is one of: working | submitted | input-required | completed | failed | canceled. ONLY the last four are terminal. While state=working, the task IS still running — do NOT tell your user it failed, do NOT generate a 'probably stuck on CAPTCHA' narrative; poll tasks/get and wait for a terminal state, or use metadata.callback_url for push delivery. Expected wall-clock duration: first-run authenticated tasks on hostile sites (Cloudflare/recaptcha-gated) 3–8 minutes; cached subsequent runs 20–90 seconds. status.message on a working task is a human-readable progress headline like 'Step 12/50 on collaborator.pro — Submit the goodmenproject.com search'. Quote it verbatim to your user; do not paraphrase or interpret. On terminal=failed, tasks/get attaches metadata.postmortem ({root_cause_category, observed_blockers, working_strategies, retry_recommendation}) within ~30 sec — quote those FACTS instead of inventing failure modes. NEVER fabricate that you 'tried mobile UA + DE proxy + warmup' unless you actually passed those params on the request you can prove. === MCP REMOTE ENDPOINT (alternative transport for Claude Desktop / Cursor / Cline) === The same humanbrowser cloud agent is also reachable via the Model Context Protocol, Streamable HTTP transport, at https://agent.humanbrowser.cloud/mcp. Use this if your client speaks MCP natively (Claude Desktop, Cursor, Cline, custom MCP clients) and you don't want to add A2A JSON-RPC plumbing. Auth: same hb_live_* token, sent as Authorization: Bearer <token>. Same billing, same per-token sticky-profile semantics. Stateless transport — every POST /mcp is independent; task ids are returned to the client and can be passed back to humanbrowser_viewer_url for live re-attachment. Three tools are exposed: • humanbrowser_run(goal, country?, profile?) — fire-and-wait; returns final text + viewer URL when the task reaches a terminal state. • humanbrowser_stream(goal, country?, profile?) — same, but emits MCP notifications/progress while in flight. • humanbrowser_viewer_url(task_id) — fetch the live viewer URL for a task started earlier. Claude Desktop config snippet (claude_desktop_config.json): { "mcpServers": { "humanbrowser": { "url": "https://agent.humanbrowser.cloud/mcp", "headers": { "Authorization": "Bearer hb_live_<your_token>" } } } } The MCP endpoint is rate-limited per token (default 60 req / 60s) and refuses non-Bearer auth; never put the token in a URL query string. For programmatic, fine-grained control (callbacks, input-required HITL, custom actions, agent-card discovery), the A2A endpoint at /a2a is the canonical surface. === ADVANCED (rarely needed) === Power-user overrides on message/send.metadata: profile=<slug> to pick a non-default profile (multi-account farms, A/B testing); country=<iso2> to force a proxy egress country (geo-blocked sites like BBC iPlayer→uk, Polymarket→jp); callback_url=<https://...> for push delivery of the terminal task envelope instead of polling. Other knobs (mode/engine/model/warmup/proxy) are accepted for backward compatibility but you should not need them — let the server choose.
Skills
-
Browser TaskExecute a free-form natural-language web automation task. The agent navigates, clicks, fills forms, scrolls and reads pages on a stealth browser with a residential IP. Returns natural-language answer + structured findings. The response includes a live viewer URL (viewer_url metadata + first artifact) at https://humanbrowser.cloud/a/s_<id>?k=...; surface it to the end user when human intervention may be useful.browserautomationscrapingnavigationstealth
-
Login and ScrapeLogin to a website using provided credentials (TextPart + DataPart sensitive=true with {login,password,totp?}), then perform a scraping/data-extraction task on the authenticated session. Credentials are injected at execution time and never echoed in artifacts or logs. Login flows often need human intervention (CAPTCHA, 2FA, device-trust prompts), so relaying the response's viewer URL to the end user is typically helpful for this skill.browserloginscrapingauthenticated
-
Fill FormOpen a URL and fill the form with the provided structured data (DataPart with field map). Submit and report the resulting URL/message. The response includes a viewer URL — useful to relay if a field may need human input (CAPTCHA, validation popups).browserformsubmit
How to call
https://agent.humanbrowser.cloud/a2a