Commit Graph

17 Commits

Author SHA1 Message Date
Tomas Dvorak ed61d8ab8e feat(scraper): implement CloakBrowser support and enhance request stealth
Integrate CloakBrowser to improve success rates against Cloudflare
challenges and implement more robust request handling in the Go backend.

- Add CloakBrowser integration to Dockerfile and requirements
- Implement domain-specific request semaphores in Go to prevent rate-limiting
- Add shared HTTP client with cookie jar and header preservation for
  better session management
- Enhance request headers in Go to include modern client hints (Sec-Ch-Ua)
- Add benchmarking scripts to compare fetch methods (urllib vs Scrapling
  vs CloakBrowser)
- Update docker-compose to support CloakBrowser environment variables
- Optimize Docker image by pre-downloading patched Chromium binaries
2026-05-17 17:52:52 +02:00
Tomas Dvorak aa47f4309f refactor: optimize docker image and implement lightweight fetching
This commit improves the overall efficiency and reliability of the scraper by:

- Optimizing the Dockerfile by reducing layers, using `--no-install-recommends`, and consolidating Playwright installation.
- Adding resource limits (CPU/Memory) to the docker-compose configuration.
- Refactoring `main.go` to remove unused Cloudflare client structures and increasing cache TTL.
- Implementing a `lightweight_fetch` mechanism in `scrapling_fetch.py` using `urllib` to attempt fast requests before falling back to the heavier Scrapling/Playwright engine.
- Adding Cloudflare challenge detection to the lightweight fetcher.
2026-05-11 19:50:59 +02:00
Tomas Dvorak a8a4e1acaf update 2026-04-02 10:28:52 +02:00
Tomas Dvorak a9a89bed7c update 2026-03-20 16:17:39 +01:00
Tomas Dvorak 455bf61302 upload 2026-03-12 19:11:08 +01:00
Tomas Dvorak 7773947450 push 2025-12-01 12:43:49 +01:00
Tomas Dvorak eeb2dd79c3 Merge branch 'main' of https://github.com/Dvorinka/facr-scraper 2025-12-01 10:32:22 +01:00
Tomas Dvorak 4ebbf1155c update 2025-12-01 10:29:42 +01:00
Tomáš Dvořák bdbf2f0a31 Update main.go 2025-08-29 20:22:48 +02:00
Tomáš Dvořák 8ea7df2410 facr link 2025-08-26 08:36:11 +02:00
Tomáš Dvořák 0c76824775 fix futsal 2025-08-26 08:18:50 +02:00
Tomáš Dvořák 0bbf432b9a dd 2025-08-26 08:07:56 +02:00
Tomas Dvorak 770c970e49 efsf 2025-08-25 16:22:29 +02:00
Tomas Dvorak 2a8852c0d3 fe 2025-08-25 16:06:51 +02:00
Tomas Dvorak 46d7cd41ab g 2025-08-25 10:37:12 +02:00
Dvorinka 3c929787d8 api docs 2025-08-14 13:16:56 +02:00
Dvorinka d88f15197a first commit 2025-08-14 12:35:04 +02:00