feat(scraper): implement CloakBrowser support and enhance request stealth

Integrate CloakBrowser to improve success rates against Cloudflare
challenges and implement more robust request handling in the Go backend.

- Add CloakBrowser integration to Dockerfile and requirements
- Implement domain-specific request semaphores in Go to prevent rate-limiting
- Add shared HTTP client with cookie jar and header preservation for
  better session management
- Enhance request headers in Go to include modern client hints (Sec-Ch-Ua)
- Add benchmarking scripts to compare fetch methods (urllib vs Scrapling
  vs CloakBrowser)
- Update docker-compose to support CloakBrowser environment variables
- Optimize Docker image by pre-downloading patched Chromium binaries
This commit is contained in:
Tomas Dvorak
2026-05-17 17:52:52 +02:00
parent aa47f4309f
commit ed61d8ab8e
12 changed files with 608 additions and 23 deletions
+3 -1
View File
@@ -82,9 +82,11 @@ def scrapling_fetch(url: str, referer: str = "", timeout_ms: int = 30000, wait_m
if referer:
extra_headers["Referer"] = referer
# Increase challenge-solving timeout; network_idle can interfere with
# ongoing Cloudflare polling so we disable it.
fetch_kwargs = {
"headless": True,
"network_idle": True,
"network_idle": False,
"google_search": False,
"solve_cloudflare": True,
"timeout": timeout_ms,