How to Bypass Anti-Bot Checks with Scrapling in OpenClaw

TL;DR

Scrapling provides powerful anti-bot bypass capabilities through its StealthyFetcher and DynamicFetcher modes. Use StealthyFetcher for Cloudflare-protected sites (automatic Turnstile solving, canvas fingerprint randomization, WebRTC blocking) or DynamicFetcher for JavaScript-heavy anti-bot implementations. Integrate with OpenClaw to control all scraping operations through natural language commands.

Introduction

You visit a website to collect data. The page loads. Then you see it: "Access Denied" or a CAPTCHA challenge. The site detected your scraper and blocked access. This scenario plays out constantly for developers, data scientists, and researchers who need web data for legitimate projects.

This happens because websites increasingly deploy sophisticated anti-bot systems. Cloudflare, PerimeterX, Akamai, and similar services analyze browser fingerprints, behavior patterns, and request characteristics to identify automated access. Traditional scrapers fail immediately against these defenses.\n\nThe bot detection industry has grown into a multi-billion dollar market. Companies invest heavily in protecting their digital assets from automated access. Cloudflare alone reports blocking billions of bot requests daily. This creates significant challenges for legitimate data collection, whether for market research, competitive analysis, price monitoring, or academic research.

Scrapling solves this problem. The library includes multiple anti-detection modes designed specifically to bypass these protections. Combined with OpenClaw's natural language interface, you can instruct your AI assistant to bypass anti-bot checks without writing complex code.

💡

For API development and testing workflows, Apidog provides comprehensive tools that complement web scraping capabilities, allowing you to test and validate the data you collect from protected sites.

button

Understanding Anti-Bot Detection

Before bypassing detection, you need to understand how it works. Anti-bot systems analyze several factors:

Browser Fingerprinting: Sites collect information about your browser, including screen resolution, installed fonts, WebGL renderer, canvas output, and hundreds of other signals. Automated tools often reveal themselves through consistent fingerprints that differ from real browsers.

Behavioral Analysis: Human users move mice unpredictably, scroll at varying speeds, and type with natural timing. Bots often exhibit mechanical patterns: instant page loads, uniform scroll speeds, perfect timing between actions.

Request Analysis: Each HTTP request includes headers, TLS fingerprints, and connection patterns. Standard HTTP libraries like requests make requests that look clearly automated compared to real browser traffic.

JavaScript Challenges: Modern sites execute JavaScript to collect browser information. Cloudflare's Turnstile, for example, runs invisible tests that verify browser integrity before showing content.

IP Reputation: IP addresses get flagged based on hosting provider, history of suspicious activity, and geographic location. Data center IPs trigger immediate suspicion.

Scrapling addresses each of these detection vectors through its specialized fetchers.

Scrapling's Anti-Bot Capabilities

Scrapling provides two main fetchers for bypassing anti-bot systems:

StealthyFetcher handles most Cloudflare and similar protections. It uses headless Chrome with built-in evasion techniques that patch common detection vectors automatically.

DynamicFetcher provides full browser automation through Playwright. Use it when StealthyFetcher fails or when the site uses advanced JavaScript-based detection.

Here's how to choose:

Scenario	Recommended Fetcher
Cloudflare protection	StealthyFetcher
Turnstile CAPTCHA	StealthyFetcher
Basic bot detection	StealthyFetcher
Complex JavaScript challenges	DynamicFetcher
Infinite scroll with anti-bot	DynamicFetcher
Custom anti-bot solutions	DynamicFetcher

StealthyFetcher Deep Dive

StealthyFetcher is Scrapling's primary tool for bypassing anti-bot systems. It handles most common protection mechanisms automatically.

Basic Usage

from scrapling.fetchers import StealthyFetcher

fetcher = StealthyFetcher()
page = fetcher.get('https://protected-site.com')
print(page.text)

The fetcher automatically attempts to bypass Cloudflare, PerimeterX, and similar protections.

Cloudflare Turnstile Bypass

Cloudflare Turnstile is one of the most common anti-bot challenges. StealthyFetcher solves it automatically:

page = StealthyFetcher.fetch(
    'https://cloudflare-protected-site.com',
    solve_cloudflare=True
)

The solve_cloudflare=True parameter triggers automatic challenge solving. This works for both interstitial challenges (the "Checking your browser before accessing" page) and Turnstile widgets.

Canvas Fingerprint Randomization

Canvas fingerprinting creates unique identifiers based on how your browser renders graphics. StealthyFetcher adds random noise to canvas operations:

page = StealthyFetcher.fetch(
    'https://site.com',
    hide_canvas=True
)

Each request generates different canvas output, making fingerprint tracking ineffective.

WebRTC Leak Prevention

WebRTC can expose your real IP address even when using proxies. StealthyFetcher blocks WebRTC requests:

page = StealthyFetcher.fetch(
    'https://site.com',
    block_webrtc=True
)

This prevents local IP leaks that could reveal your identity or location.

Google Search Referer Spoofing

Many sites allow access when they think traffic comes from Google search. StealthyFetcher spoofs this referer:

page = StealthyFetcher.fetch(
    'https://site.com',
    google_search=True
)

This makes the request appear to come from Google's search results page.

Using Installed Chrome

For maximum evasion, use your installed Chrome browser instead of Playwright's Chromium:

page = StealthyFetcher.fetch(
    'https://site.com',
    real_chrome=True
)

This uses your actual Chrome installation, which has a legitimate browser fingerprint.

Geographic Spoofing

Match your request to a specific location:

page = StealthyFetcher.fetch(
    'https://site.com',
    locale='en-US',
    timezone_id='America/New_York'
)

This sets browser timezone and language settings to match your desired location.

DynamicFetcher for Advanced Protection

Some sites use sophisticated anti-bot systems that StealthyFetcher cannot bypass. DynamicFetcher provides full browser automation with Playwright:

from scrapling.fetchers import DynamicFetcher

fetcher = DynamicFetcher()
page = fetcher.get('https://highly-protected-site.com')
print(page.text)

Handling Infinite Scroll

Sites with infinite scroll and anti-bot protection require browser automation:

from scrapling.fetchers import DynamicFetcher
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    fetcher = DynamicFetcher(playwright=p)
    page = fetcher.get('https://site.com/infinite-scroll')

    # Wait for content to load
    page.wait_for_selector('.content-item')

    # Scroll to load more
    for _ in range(5):
        page.evaluate('window.scrollTo(0, document.body.scrollHeight)')
        page.wait_for_timeout(1000)

Waiting for JavaScript Execution

Some content loads through JavaScript after initial page load:

page = DynamicFetcher.get('https://site.com')
page.wait_for_load_state('networkidle')
content = page.content()

This ensures all JavaScript has executed before extracting content.

Handling CAPTCHAs Manually

For CAPTCHAs that cannot be solved automatically:

page = DynamicFetcher.get('https://site.com')

# Check if CAPTCHA is present
if page.is_visible('.captcha-container'):
    # Take screenshot for manual solving
    page.screenshot(path='captcha.png')
    # After manual solve, continue
    page.click('.captcha-submit')

You can pause for manual intervention, solve the CAPTCHA yourself, and resume scraping.

Using with OpenClaw

OpenClaw lets you control Scrapling's anti-bot capabilities through natural language (if you want to check how to setup Scrapling inside OpenClaw, check this post):

Bypass Cloudflare on a site:

"Get the product data from https://shop.example.com, this site has Cloudflare protection"

OpenClaw automatically uses StealthyFetcher with solve_cloudflare enabled.

Handle a site with advanced protection:

"Scrape the job listings from https://careers.example.com, use headless browser mode because they have strong anti-bot protection"

OpenClaw switches to DynamicFetcher for full browser automation.

Use proxy rotation:

"Extract data from these 100 URLs, rotate through these proxies: proxy1.com:8080, proxy2.com:8080, proxy3.com:8080"

OpenClaw distributes requests across multiple proxies.

Spoof geographic location:

"Get the price data from https://site.com, use US East Coast settings"

OpenClaw configures timezone and locale to match.

Anti-Bot Techniques Explained

Understanding the underlying techniques helps you choose the right approach:

TLS Fingerprint Spoofing

When your browser connects to a website, it performs a TLS handshake. The ClientHello message includes characteristics that identify your client library. Standard Python requests have distinctive fingerprints that anti-bot systems recognize.

Scrapling spoofs TLS fingerprints to appear as legitimate browsers. This happens automatically with StealthyFetcher and DynamicFetcher.

User-Agent Rotation

Sending requests with the same User-Agent string triggers detection. Scrapling rotates User-Agents automatically:

# User-Agent is automatically rotated
page = StealthyFetcher.get('https://site.com')

Each request appears to come from a different browser version.

Header Spoofing

Real browsers send specific headers in specific orders. Scrapling ensures headers match legitimate browser behavior automatically.

Canvas Randomization

When a website asks your browser to draw something, the exact pixels reveal your browser and GPU. Scrapling adds imperceptible noise to canvas operations, making each fingerprint unique.

Screen Resolution and Window Size

Headless browsers often report default screen sizes. Scrapling randomizes viewport dimensions to match real user displays.

Mouse Movement Simulation

DynamicFetcher can simulate human-like mouse movements:

page.mouse.move_to_element('.button')
page.mouse.move_by_offset(50, 20)
page.click('.submit')

This adds realistic movement patterns that pass behavioral analysis.

Proxy Integration

Using proxies helps avoid IP-based blocking and enables geographic scraping:

Basic Proxy Usage

from scrapling.fetchers import StealthyFetcher

fetcher = StealthyFetcher()
page = fetcher.get(
    'https://site.com',
    proxy='http://username:password@proxy.example.com:8080'
)

Proxy Rotation

For large-scale scraping, rotate through multiple proxies:

import random
from scrapling.fetchers import StealthyFetcher

proxies = [
    'http://proxy1.com:8080',
    'http://proxy2.com:8080',
    'http://proxy3.com:8080'
]

fetcher = StealthyFetcher()

for url in urls:
    proxy = random.choice(proxies)
    page = fetcher.get(url, proxy=proxy)
    # Process page

Residential Proxies

Residential proxies use IP addresses from real internet service providers. They are harder to detect than data center IPs:

page = StealthyFetcher.get(
    'https://site.com',
    proxy='http://residential-proxy-provider:port'
)

Residential proxies cost more but provide significantly higher success rates on protected sites.

Common Anti-Bot Scenarios

Cloudflare Protection

Cloudflare is the most common anti-bot solution. Most sites work with basic StealthyFetcher:

page = StealthyFetcher.fetch('https://cloudflare-site.com', solve_cloudflare=True)

If Cloudflare shows a challenge page, the fetcher automatically solves it and retries.

PerimeterX (Now Ownl)

PerimeterX (now part of Ownl) uses behavioral analysis:

# Use DynamicFetcher for PerimeterX
page = DynamicFetcher.get('https://perimeterx-site.com')

The full browser automation handles behavioral challenges better.

Akamai

Akamai provides enterprise-grade bot management:

# Akamai often requires residential proxies
page = StealthyFetcher.get(
    'https://akamai-protected.com',
    proxy='http://residential-proxy:port',
    solve_cloudflare=True
)

Akamai-protected sites often require combining multiple evasion techniques.

Custom Anti-Bot Solutions

Some sites build their own detection systems:

# Use maximum stealth settings
page = StealthyFetcher.fetch(
    'https://custom-protected.com',
    solve_cloudflare=True,
    block_webrtc=True,
    hide_canvas=True,
    google_search=True,
    real_chrome=True
)

If this fails, switch to DynamicFetcher with full browser automation.

Best Practices

Start Simple

Begin with basic StealthyFetcher. Only add complexity if you encounter blocks:

# Try basic first
page = StealthyFetcher.get('https://site.com')

# Add evasion if needed
if 'blocked' in page.text.lower():
    page = StealthyFetcher.fetch('https://site.com', solve_cloudflare=True)

Respect Rate Limits

Even with anti-bot capabilities, sending too many requests triggers protection:

import time

for url in urls:
    page = StealthyFetcher.get(url)
    time.sleep(2)  # Wait between requests

Use Residential Proxies for Production

Free or cheap proxies often have poor reputations. Invest in quality residential proxies for reliable scraping:

# Quality residential proxy
page = StealthyFetcher.get(
    'https://site.com',
    proxy='http://premium-residential-proxy:port'
)

Check robots.txt

Always check if the site allows scraping:

# Check robots.txt before scraping
from urllib.parse import urlparse

domain = urlparse('https://site.com').netloc
robots_url = f'https://{domain}/robots.txt'

Handle Errors Gracefully

Build error handling into your scraper:

from scrapling.fetchers import StealthyFetcher

fetcher = StealthyFetcher()

try:
    page = fetcher.get('https://site.com')
except Exception as e:
    print(f'Error: {e}')
    # Fall back to DynamicFetcher
    from scrapling.fetchers import DynamicFetcher
    fetcher = DynamicFetcher()
    page = fetcher.get('https://site.com')

Troubleshooting

Still Getting Blocked

If you encounter blocks despite using StealthyFetcher:

Enable all evasion options:

page = StealthyFetcher.fetch(
    'https://site.com',
    solve_cloudflare=True,
    block_webrtc=True,
    hide_canvas=True,
    google_search=True,
    real_chrome=True
)

Switch to DynamicFetcher:

from scrapling.fetchers import DynamicFetcher
page = DynamicFetcher.get('https://site.com')

Add proxy rotation:

page = StealthyFetcher.get(
    'https://site.com',
    proxy='http://residential-proxy:port'
)

Cloudflare Challenge Loop

Sometimes StealthyFetcher gets stuck in a challenge loop:

Increase timeout:

page = StealthyFetcher.fetch(
    'https://site.com',
    solve_cloudflare=True,
    timeout=120
)

Use real_chrome:

page = StealthyFetcher.fetch(
    'https://site.com',
    solve_cloudflare=True,
    real_chrome=True
)

CAPTCHAs Not Solving

Some CAPTCHAs require manual intervention:

page = DynamicFetcher.get('https://site.com')

if page.is_visible('[class*="captcha"]'):
    page.screenshot(path='manual_captcha.png')
    # Solve manually, then continue
    input('Press Enter after solving CAPTCHA...')
    page.click('.submit-button')

Slow Performance

Anti-bot evasion adds overhead. For faster scraping:

Use StealthyFetcher instead of DynamicFetcher when possible
Add connection pooling
Use faster proxies
Reduce unnecessary evasion options

Conclusion

Bypassing anti-bot checks requires understanding how detection works and using the right tools for each situation. Scrapling provides comprehensive solutions through StealthyFetcher for most protection systems and DynamicFetcher for advanced scenarios.

The key takeaways:

Use StealthyFetcher for Cloudflare, Turnstile, and basic bot protection
Enable solve_cloudflare=True for automatic challenge solving
Switch to DynamicFetcher when StealthyFetcher fails
Add proxy rotation for large-scale scraping
Combine multiple evasion techniques for stubborn sites

With OpenClaw integration, you control all these capabilities through natural language. Tell your AI assistant what you need, and it selects the appropriate anti-bot approach automatically. Once you have collected your data, you can use Apidog to test and validate APIs, create automated test suites, and generate documentation for the endpoints you discover.

button

FAQ

What's the difference between StealthyFetcher and DynamicFetcher?

StealthyFetcher uses modified headless Chrome with built-in evasion patches. DynamicFetcher uses full Playwright automation. StealthyFetcher is faster but may fail against advanced detection. DynamicFetcher is more reliable but slower.

Does Scrapling work against all anti-bot systems?

No anti-bot solution works 100% of the time. Scrapling handles most common systems (Cloudflare, PerimeterX, Akamai) reliably. Custom or enterprise solutions may require additional techniques or manual intervention.

Is bypassing anti-bot legal?

Laws vary by jurisdiction and depend on the site's terms of service. Generally, scraping public data is acceptable. Bypassing authentication or accessing private data without authorization crosses legal boundaries.

Why is my scraping still getting blocked?

Check these common issues: IP reputation (use residential proxies), rate limiting (add delays), insufficient evasion (enable more options), or JavaScript challenges (use DynamicFetcher).

How do I handle CAPTCHAs?

StealthyFetcher automatically solves Cloudflare Turnstile. For other CAPTCHAs, use DynamicFetcher and pause for manual solving, or integrate third-party CAPTCHA solving services.

Can I use my own Chrome browser?

Yes. Set real_chrome=True in StealthyFetcher to use your installed Chrome instead of Playwright's Chromium. This provides a more legitimate browser fingerprint.

Do I need proxies?

For small-scale scraping, no. For production or large-scale operations, residential proxies significantly improve success rates by avoiding IP-based blocking.

How do I rotate User-Agents?

StealthyFetcher rotates User-Agents automatically. For manual control:

fetcher = StealthyFetcher(headers={'User-Agent': 'Your-Custom-UA'})

What's the success rate against Cloudflare?

With proper configuration, success rates exceed 90% for most Cloudflare-protected sites. Turnstile challenges are solved automatically.

Can I scrape from multiple geographic locations?

Yes. Use the timezone_id and locale parameters:

page = StealthyFetcher.fetch(
    'https://site.com',
    timezone_id='Europe/London',
    locale='en-GB'
)