mirror of
https://github.com/Dvorinka/facr-scraper.git
synced 2026-06-03 20:12:57 +00:00
76 lines
2.4 KiB
Markdown
76 lines
2.4 KiB
Markdown
# FACR Scraper - Coolify Deployment Guide
|
|
|
|
## Summary
|
|
✅ **Dockerized and ready for Coolify deployment**
|
|
✅ **Scrapling fully working in container**
|
|
✅ **All fallback methods functional**
|
|
|
|
## How it Works
|
|
The scraper uses a **4-tier fallback system**:
|
|
1. **Direct HTTP requests** (blocked by Cloudflare 403)
|
|
2. **wget fallback** (also blocked)
|
|
3. **✅ Scrapling with Playwright** (bypasses Cloudflare - working!)
|
|
4. **Cloudflare Browser Rendering API** (if configured)
|
|
|
|
## Coolify Deployment
|
|
|
|
### Option 1: Docker Compose (Recommended)
|
|
1. Push code to your Git repository
|
|
2. In Coolify, create new **Docker Compose** application
|
|
3. Use the provided `docker-compose.yml`
|
|
4. Set environment variables as needed
|
|
|
|
### Option 2: Dockerfile
|
|
1. Push code to Git repository
|
|
2. In Coolify, create new **Docker** application
|
|
3. Use the provided `Dockerfile`
|
|
4. Set port mapping to `8686`
|
|
|
|
### Environment Variables
|
|
```bash
|
|
LOGOAPI_BASE_URL=https://logoapi.sportcreative.eu
|
|
CLOUDFLARE_ACCOUNT_ID=your_account_id # Optional
|
|
CLOUDFLARE_API_TOKEN=your_api_token # Optional
|
|
SCRAPLING_PYTHON_BIN=/opt/scrapling/bin/python
|
|
SCRAPLING_SCRIPT=/opt/scrapling/scripts/scrapling_fetch.py
|
|
DEBUG_SAVE_HTML= # Leave empty for production
|
|
```
|
|
|
|
### Resource Requirements
|
|
- **Minimum**: 1 CPU, 1GB RAM
|
|
- **Recommended**: 2 CPU, 2GB RAM
|
|
- **Storage**: 2GB+ (for Playwright browsers)
|
|
|
|
### Health Check
|
|
The container includes a built-in health check:
|
|
- Endpoint: `http://localhost:8686/`
|
|
- Interval: 30s
|
|
- Timeout: 10s
|
|
|
|
## Verification
|
|
After deployment, test:
|
|
```bash
|
|
curl https://your-domain.coolify.app/
|
|
curl https://your-domain.coolify.app/club/football/00000000-0000-0000-0000-000000000000
|
|
```
|
|
|
|
## Performance Notes
|
|
- **Cold start**: ~10-15 seconds (Playwright initialization)
|
|
- **Subsequent requests**: ~2-5 seconds per page
|
|
- **Concurrent scraping**: Supported (each request independent)
|
|
- **Rate limiting**: Handled by fallback system
|
|
|
|
## Troubleshooting
|
|
If Scrapling fails in production:
|
|
1. Check logs for "Successfully retrieved content via Scrapling"
|
|
2. Verify container has enough memory (>1GB)
|
|
3. Ensure no outbound network restrictions
|
|
4. Monitor Cloudflare protection changes
|
|
|
|
## Files Created
|
|
- `Dockerfile` - Multi-stage build with Go + Python/Playwright
|
|
- `docker-compose.yml` - Ready for Coolify deployment
|
|
- `.dockerignore` - Optimize build context
|
|
|
|
The Dockerized version maintains **100% feature parity** with local development.
|