TL;DR
Scrapling provides powerful anti-bot bypass capabilities through its StealthyFetcher and DynamicFetcher modes. Use StealthyFetcher for Cloudflare-protected sites (automatic Turnstile solving, canvas fingerprint randomization, WebRTC blocking) or DynamicFetcher for JavaScript-heavy anti-bot implementations. Integrate with OpenClaw to control all scraping operations through natural language commands.
Introduction
You visit a website to collect data. The page loads. Then you see it: "Access Denied" or a CAPTCHA challenge. The site detected your scraper and blocked access. This scenario plays out constantly for developers, data scientists, and researchers who need web data for legitimate projects.
This happens because websites increasingly deploy sophisticated anti-bot systems. Cloudflare, PerimeterX, Akamai, and similar services analyze browser fingerprints, behavior patterns, and request characteristics to identify automated access. Traditional scrapers fail immediately against these defenses.\n\nThe bot detection industry has grown into a multi-billion dollar market. Companies invest heavily in protecting their digital assets from automated access. Cloudflare alone reports blocking billions of bot requests daily. This creates significant challenges for legitimate data collection, whether for market research, competitive analysis, price monitoring, or academic research.
Scrapling solves this problem. The library includes multiple anti-detection modes designed specifically to bypass these protections. Combined with OpenClaw's natural language interface, you can instruct your AI assistant to bypass anti-bot checks without writing complex code.
Understanding Anti-Bot Detection
Before bypassing detection, you need to understand how it works. Anti-bot systems analyze several factors:
Browser Fingerprinting: Sites collect information about your browser, including screen resolution, installed fonts, WebGL renderer, canvas output, and hundreds of other signals. Automated tools often reveal themselves through consistent fingerprints that differ from real browsers.
Behavioral Analysis: Human users move mice unpredictably, scroll at varying speeds, and type with natural timing. Bots often exhibit mechanical patterns: instant page loads, uniform scroll speeds, perfect timing between actions.
Request Analysis: Each HTTP request includes headers, TLS fingerprints, and connection patterns. Standard HTTP libraries like requests make requests that look clearly automated compared to real browser traffic.
JavaScript Challenges: Modern sites execute JavaScript to collect browser information. Cloudflare's Turnstile, for example, runs invisible tests that verify browser integrity before showing content.
IP Reputation: IP addresses get flagged based on hosting provider, history of suspicious activity, and geographic location. Data center IPs trigger immediate suspicion.
Scrapling addresses each of these detection vectors through its specialized fetchers.
Scrapling's Anti-Bot Capabilities
Scrapling provides two main fetchers for bypassing anti-bot systems:

StealthyFetcher handles most Cloudflare and similar protections. It uses headless Chrome with built-in evasion techniques that patch common detection vectors automatically.
DynamicFetcher provides full browser automation through Playwright. Use it when StealthyFetcher fails or when the site uses advanced JavaScript-based detection.
Here's how to choose:
| Scenario | Recommended Fetcher |
|---|---|
| Cloudflare protection | StealthyFetcher |
| Turnstile CAPTCHA | StealthyFetcher |
| Basic bot detection | StealthyFetcher |
| Complex JavaScript challenges | DynamicFetcher |
| Infinite scroll with anti-bot | DynamicFetcher |
| Custom anti-bot solutions | DynamicFetcher |
StealthyFetcher Deep Dive
StealthyFetcher is Scrapling's primary tool for bypassing anti-bot systems. It handles most common protection mechanisms automatically.
Basic Usage
from scrapling.fetchers import StealthyFetcher
fetcher = StealthyFetcher()
page = fetcher.get('https://protected-site.com')
print(page.text)
The fetcher automatically attempts to bypass Cloudflare, PerimeterX, and similar protections.
Cloudflare Turnstile Bypass
Cloudflare Turnstile is one of the most common anti-bot challenges. StealthyFetcher solves it automatically:
page = StealthyFetcher.fetch(
'https://cloudflare-protected-site.com',
solve_cloudflare=True
)
The solve_cloudflare=True parameter triggers automatic challenge solving. This works for both interstitial challenges (the "Checking your browser before accessing" page) and Turnstile widgets.
Canvas Fingerprint Randomization
Canvas fingerprinting creates unique identifiers based on how your browser renders graphics. StealthyFetcher adds random noise to canvas operations:
page = StealthyFetcher.fetch(
'https://site.com',
hide_canvas=True
)
Each request generates different canvas output, making fingerprint tracking ineffective.
WebRTC Leak Prevention
WebRTC can expose your real IP address even when using proxies. StealthyFetcher blocks WebRTC requests:
page = StealthyFetcher.fetch(
'https://site.com',
block_webrtc=True
)
This prevents local IP leaks that could reveal your identity or location.
Google Search Referer Spoofing
Many sites allow access when they think traffic comes from Google search. StealthyFetcher spoofs this referer:
page = StealthyFetcher.fetch(
'https://site.com',
google_search=True
)
This makes the request appear to come from Google's search results page.
Using Installed Chrome
For maximum evasion, use your installed Chrome browser instead of Playwright's Chromium:
page = StealthyFetcher.fetch(
'https://site.com',
real_chrome=True
)
This uses your actual Chrome installation, which has a legitimate browser fingerprint.
Geographic Spoofing
Match your request to a specific location:
page = StealthyFetcher.fetch(
'https://site.com',
locale='en-US',
timezone_id='America/New_York'
)
This sets browser timezone and language settings to match your desired location.
DynamicFetcher for Advanced Protection
Some sites use sophisticated anti-bot systems that StealthyFetcher cannot bypass. DynamicFetcher provides full browser automation with Playwright:
from scrapling.fetchers import DynamicFetcher
fetcher = DynamicFetcher()
page = fetcher.get('https://highly-protected-site.com')
print(page.text)
Handling Infinite Scroll
Sites with infinite scroll and anti-bot protection require browser automation:
from scrapling.fetchers import DynamicFetcher
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
fetcher = DynamicFetcher(playwright=p)
page = fetcher.get('https://site.com/infinite-scroll')
# Wait for content to load
page.wait_for_selector('.content-item')
# Scroll to load more
for _ in range(5):
page.evaluate('window.scrollTo(0, document.body.scrollHeight)')
page.wait_for_timeout(1000)
Waiting for JavaScript Execution
Some content loads through JavaScript after initial page load:
page = DynamicFetcher.get('https://site.com')
page.wait_for_load_state('networkidle')
content = page.content()
This ensures all JavaScript has executed before extracting content.
Handling CAPTCHAs Manually
For CAPTCHAs that cannot be solved automatically:
page = DynamicFetcher.get('https://site.com')
# Check if CAPTCHA is present
if page.is_visible('.captcha-container'):
# Take screenshot for manual solving
page.screenshot(path='captcha.png')
# After manual solve, continue
page.click('.captcha-submit')
You can pause for manual intervention, solve the CAPTCHA yourself, and resume scraping.
Using with OpenClaw
OpenClaw lets you control Scrapling's anti-bot capabilities through natural language (if you want to check how to setup Scrapling inside OpenClaw, check this post):
Bypass Cloudflare on a site:
"Get the product data from https://shop.example.com, this site has Cloudflare protection"
OpenClaw automatically uses StealthyFetcher with solve_cloudflare enabled.
Handle a site with advanced protection:
"Scrape the job listings from https://careers.example.com, use headless browser mode because they have strong anti-bot protection"
OpenClaw switches to DynamicFetcher for full browser automation.
Use proxy rotation:
"Extract data from these 100 URLs, rotate through these proxies: proxy1.com:8080, proxy2.com:8080, proxy3.com:8080"
OpenClaw distributes requests across multiple proxies.
Spoof geographic location:
"Get the price data from https://site.com, use US East Coast settings"
OpenClaw configures timezone and locale to match.
Anti-Bot Techniques Explained
Understanding the underlying techniques helps you choose the right approach:
TLS Fingerprint Spoofing
When your browser connects to a website, it performs a TLS handshake. The ClientHello message includes characteristics that identify your client library. Standard Python requests have distinctive fingerprints that anti-bot systems recognize.
Scrapling spoofs TLS fingerprints to appear as legitimate browsers. This happens automatically with StealthyFetcher and DynamicFetcher.
User-Agent Rotation
Sending requests with the same User-Agent string triggers detection. Scrapling rotates User-Agents automatically:
# User-Agent is automatically rotated
page = StealthyFetcher.get('https://site.com')
Each request appears to come from a different browser version.
Header Spoofing
Real browsers send specific headers in specific orders. Scrapling ensures headers match legitimate browser behavior automatically.
Canvas Randomization
When a website asks your browser to draw something, the exact pixels reveal your browser and GPU. Scrapling adds imperceptible noise to canvas operations, making each fingerprint unique.
Screen Resolution and Window Size
Headless browsers often report default screen sizes. Scrapling randomizes viewport dimensions to match real user displays.
Mouse Movement Simulation
DynamicFetcher can simulate human-like mouse movements:
page.mouse.move_to_element('.button')
page.mouse.move_by_offset(50, 20)
page.click('.submit')
This adds realistic movement patterns that pass behavioral analysis.
Proxy Integration
Using proxies helps avoid IP-based blocking and enables geographic scraping:
Basic Proxy Usage
from scrapling.fetchers import StealthyFetcher
fetcher = StealthyFetcher()
page = fetcher.get(
'https://site.com',
proxy='http://username:password@proxy.example.com:8080'
)
Proxy Rotation
For large-scale scraping, rotate through multiple proxies:
import random
from scrapling.fetchers import StealthyFetcher
proxies = [
'http://proxy1.com:8080',
'http://proxy2.com:8080',
'http://proxy3.com:8080'
]
fetcher = StealthyFetcher()
for url in urls:
proxy = random.choice(proxies)
page = fetcher.get(url, proxy=proxy)
# Process page
Residential Proxies
Residential proxies use IP addresses from real internet service providers. They are harder to detect than data center IPs:
page = StealthyFetcher.get(
'https://site.com',
proxy='http://residential-proxy-provider:port'
)
Residential proxies cost more but provide significantly higher success rates on protected sites.
Common Anti-Bot Scenarios
Cloudflare Protection
Cloudflare is the most common anti-bot solution. Most sites work with basic StealthyFetcher:
page = StealthyFetcher.fetch('https://cloudflare-site.com', solve_cloudflare=True)
If Cloudflare shows a challenge page, the fetcher automatically solves it and retries.
PerimeterX (Now Ownl)
PerimeterX (now part of Ownl) uses behavioral analysis:
# Use DynamicFetcher for PerimeterX
page = DynamicFetcher.get('https://perimeterx-site.com')
The full browser automation handles behavioral challenges better.
Akamai
Akamai provides enterprise-grade bot management:
# Akamai often requires residential proxies
page = StealthyFetcher.get(
'https://akamai-protected.com',
proxy='http://residential-proxy:port',
solve_cloudflare=True
)
Akamai-protected sites often require combining multiple evasion techniques.
Custom Anti-Bot Solutions
Some sites build their own detection systems:
# Use maximum stealth settings
page = StealthyFetcher.fetch(
'https://custom-protected.com',
solve_cloudflare=True,
block_webrtc=True,
hide_canvas=True,
google_search=True,
real_chrome=True
)
If this fails, switch to DynamicFetcher with full browser automation.
Best Practices
Start Simple
Begin with basic StealthyFetcher. Only add complexity if you encounter blocks:
# Try basic first
page = StealthyFetcher.get('https://site.com')
# Add evasion if needed
if 'blocked' in page.text.lower():
page = StealthyFetcher.fetch('https://site.com', solve_cloudflare=True)
Respect Rate Limits
Even with anti-bot capabilities, sending too many requests triggers protection:
import time
for url in urls:
page = StealthyFetcher.get(url)
time.sleep(2) # Wait between requests
Use Residential Proxies for Production
Free or cheap proxies often have poor reputations. Invest in quality residential proxies for reliable scraping:
# Quality residential proxy
page = StealthyFetcher.get(
'https://site.com',
proxy='http://premium-residential-proxy:port'
)
Check robots.txt
Always check if the site allows scraping:
# Check robots.txt before scraping
from urllib.parse import urlparse
domain = urlparse('https://site.com').netloc
robots_url = f'https://{domain}/robots.txt'
Handle Errors Gracefully
Build error handling into your scraper:
from scrapling.fetchers import StealthyFetcher
fetcher = StealthyFetcher()
try:
page = fetcher.get('https://site.com')
except Exception as e:
print(f'Error: {e}')
# Fall back to DynamicFetcher
from scrapling.fetchers import DynamicFetcher
fetcher = DynamicFetcher()
page = fetcher.get('https://site.com')
Troubleshooting
Still Getting Blocked
If you encounter blocks despite using StealthyFetcher:
- Enable all evasion options:
page = StealthyFetcher.fetch(
'https://site.com',
solve_cloudflare=True,
block_webrtc=True,
hide_canvas=True,
google_search=True,
real_chrome=True
)
- Switch to DynamicFetcher:
from scrapling.fetchers import DynamicFetcher
page = DynamicFetcher.get('https://site.com')
- Add proxy rotation:
page = StealthyFetcher.get(
'https://site.com',
proxy='http://residential-proxy:port'
)
Cloudflare Challenge Loop
Sometimes StealthyFetcher gets stuck in a challenge loop:
- Increase timeout:
page = StealthyFetcher.fetch(
'https://site.com',
solve_cloudflare=True,
timeout=120
)
- Use real_chrome:
page = StealthyFetcher.fetch(
'https://site.com',
solve_cloudflare=True,
real_chrome=True
)
CAPTCHAs Not Solving
Some CAPTCHAs require manual intervention:
page = DynamicFetcher.get('https://site.com')
if page.is_visible('[class*="captcha"]'):
page.screenshot(path='manual_captcha.png')
# Solve manually, then continue
input('Press Enter after solving CAPTCHA...')
page.click('.submit-button')
Slow Performance
Anti-bot evasion adds overhead. For faster scraping:
- Use StealthyFetcher instead of DynamicFetcher when possible
- Add connection pooling
- Use faster proxies
- Reduce unnecessary evasion options
Conclusion
Bypassing anti-bot checks requires understanding how detection works and using the right tools for each situation. Scrapling provides comprehensive solutions through StealthyFetcher for most protection systems and DynamicFetcher for advanced scenarios.
The key takeaways:
- Use StealthyFetcher for Cloudflare, Turnstile, and basic bot protection
- Enable solve_cloudflare=True for automatic challenge solving
- Switch to DynamicFetcher when StealthyFetcher fails
- Add proxy rotation for large-scale scraping
- Combine multiple evasion techniques for stubborn sites
With OpenClaw integration, you control all these capabilities through natural language. Tell your AI assistant what you need, and it selects the appropriate anti-bot approach automatically. Once you have collected your data, you can use Apidog to test and validate APIs, create automated test suites, and generate documentation for the endpoints you discover.
FAQ
What's the difference between StealthyFetcher and DynamicFetcher?
StealthyFetcher uses modified headless Chrome with built-in evasion patches. DynamicFetcher uses full Playwright automation. StealthyFetcher is faster but may fail against advanced detection. DynamicFetcher is more reliable but slower.
Does Scrapling work against all anti-bot systems?
No anti-bot solution works 100% of the time. Scrapling handles most common systems (Cloudflare, PerimeterX, Akamai) reliably. Custom or enterprise solutions may require additional techniques or manual intervention.
Is bypassing anti-bot legal?
Laws vary by jurisdiction and depend on the site's terms of service. Generally, scraping public data is acceptable. Bypassing authentication or accessing private data without authorization crosses legal boundaries.
Why is my scraping still getting blocked?
Check these common issues: IP reputation (use residential proxies), rate limiting (add delays), insufficient evasion (enable more options), or JavaScript challenges (use DynamicFetcher).
How do I handle CAPTCHAs?
StealthyFetcher automatically solves Cloudflare Turnstile. For other CAPTCHAs, use DynamicFetcher and pause for manual solving, or integrate third-party CAPTCHA solving services.
Can I use my own Chrome browser?
Yes. Set real_chrome=True in StealthyFetcher to use your installed Chrome instead of Playwright's Chromium. This provides a more legitimate browser fingerprint.
Do I need proxies?
For small-scale scraping, no. For production or large-scale operations, residential proxies significantly improve success rates by avoiding IP-based blocking.
How do I rotate User-Agents?
StealthyFetcher rotates User-Agents automatically. For manual control:
fetcher = StealthyFetcher(headers={'User-Agent': 'Your-Custom-UA'})
What's the success rate against Cloudflare?
With proper configuration, success rates exceed 90% for most Cloudflare-protected sites. Turnstile challenges are solved automatically.
Can I scrape from multiple geographic locations?
Yes. Use the timezone_id and locale parameters:
page = StealthyFetcher.fetch(
'https://site.com',
timezone_id='Europe/London',
locale='en-GB'
)



