Files
Trackeep/docs/FAVICON_FETCHING.md
T
Tomas Dvorak b083dac3f0 feat(frontend): enhance API credentials system and build configuration
Add real API support in demo mode with credential checking, implement build-time version injection from package.json, and refactor update checking with 24-hour caching. Migrate landing page from Vue to Astro with comprehensive UI components including Hero, Features, Benefits, and Tech Stack sections. Update CI/CD workflow with expanded cache paths and security scanner version pinned.
2026-02-10 16:25:57 +01:00

212 lines
5.4 KiB
Markdown

# Enhanced Favicon Fetching System
This document describes the enhanced favicon fetching system implemented to address issues with favicons not being found when they're located in non-standard paths.
## Overview
The enhanced favicon fetching system (`favicon_fetcher.go`) provides comprehensive favicon detection by:
1. **HTML Head Parsing**: Extracts favicon URLs from the `<head>` section of HTML pages
2. **Multiple Pattern Matching**: Supports various favicon declaration formats
3. **Common Location Checking**: Tests standard favicon file paths
4. **Fallback Services**: Uses Google's favicon service as a reliable fallback
## Key Features
### Comprehensive Pattern Detection
The system detects favicons declared in multiple ways:
```html
<!-- Standard favicon -->
<link rel="icon" href="/favicon.ico">
<link rel="shortcut icon" href="/favicon.ico">
<!-- Apple touch icons -->
<link rel="apple-touch-icon" href="/apple-touch-icon.png">
<link rel="apple-touch-icon-precomposed" href="/apple-touch-icon-precomposed.png">
<!-- Android icons -->
<link rel="android-chrome-192x192" href="/android-chrome-192x192.png">
<!-- Microsoft tiles -->
<meta name="msapplication-TileImage" content="/mstile-144x144.png">
<!-- Open Graph images (fallback) -->
<meta property="og:image" content="/logo.png">
<!-- Twitter images (fallback) -->
<meta name="twitter:image" content="/logo.png">
```
### Common Location Testing
The system tests these common favicon paths:
- `/favicon.ico`, `/favicon.png`, `/favicon.svg`
- `/apple-touch-icon.png`, `/apple-touch-icon-precomposed.png`
- `/android-chrome-192x192.png`
- `/icon.png`, `/icon.svg`, `/logo.png`, `/logo.svg`
- `/assets/favicon.ico`, `/static/favicon.ico`, `/images/favicon.ico`
- And more...
### Smart URL Resolution
The system properly handles:
- Absolute URLs (`https://cdn.example.com/favicon.ico`)
- Protocol-relative URLs (`//cdn.example.com/favicon.ico`)
- Root-relative URLs (`/favicon.ico`)
- Relative URLs (`../favicon.ico`, `favicon.ico`)
## Usage
### Basic Usage
```go
import "github.com/trackeep/backend/services"
// Get the best available favicon
favicon, err := services.GetFavicon("https://example.com")
if err != nil {
log.Printf("Error fetching favicon: %v", err)
}
fmt.Printf("Favicon: %s\n", favicon)
```
### Multiple Candidates
```go
// Get multiple favicon candidates
favicons := services.GetAllFavicons("https://example.com", 5)
for i, favicon := range favicons {
fmt.Printf("%d. %s\n", i+1, favicon)
}
```
### Advanced Usage
```go
fetcher := services.NewFaviconFetcher()
favicon, err := fetcher.FetchFavicon("https://example.com")
if err != nil {
// Handle error
}
```
## Integration with Metadata Service
The favicon fetcher is integrated into the existing metadata service:
```go
metadata, err := services.FetchWebsiteMetadata("https://example.com")
if err == nil {
fmt.Printf("Favicon: %s\n", metadata.Favicon)
}
```
## Performance Considerations
- **Timeout**: 10 seconds per HTTP request
- **Caching**: Consider implementing caching for frequently accessed favicons
- **Concurrency**: The system is designed to be thread-safe
- **Verification**: Each favicon candidate is verified with a HEAD request
## Testing
### CLI Testing Tool
Use the CLI tool to test favicon fetching:
```bash
cd backend
go run tools/favicon_cli.go https://github.com
```
### Unit Tests
Run the unit tests:
```bash
cd backend
go test ./services -v -run TestFaviconFetcher
```
### Benchmark Tests
Run benchmark tests:
```bash
cd backend
go test ./services -bench=BenchmarkFaviconFetch
```
## Error Handling
The system gracefully handles:
- Network timeouts and connection errors
- Invalid URLs
- HTTP errors (4xx, 5xx responses)
- Malformed HTML
- Missing favicons (falls back to Google's service)
## Fallback Strategy
1. **HTML Extraction**: Try to extract from HTML head
2. **Common Paths**: Test standard favicon locations
3. **Google Service**: Use `https://www.google.com/s2/favicons?domain=example.com&sz=128`
## Configuration
The system uses these default settings:
- HTTP Timeout: 10 seconds
- User Agent: Chrome browser string
- Max Results: 5 (for multiple favicon fetching)
- Google Favicon Size: 128px
## Troubleshooting
### Common Issues
1. **Rate Limiting**: Some sites may block frequent requests
2. **HTTPS Issues**: Certificate validation problems
3. **Redirects**: The system follows redirects automatically
4. **Large Pages**: Only the head section is parsed for efficiency
### Debug Mode
Enable debug logging by modifying the log level in the fetcher:
```go
fetcher := services.NewFaviconFetcher()
// Add debug logging as needed
```
## Future Improvements
Potential enhancements:
1. **Persistent Caching**: Implement Redis or database caching
2. **Async Fetching**: Background favicon updates
3. **Image Processing**: Size optimization and format conversion
4. **Domain Whitelisting**: Prioritize certain domains
5. **Machine Learning**: Predict best favicon based on site structure
## Migration from Old System
The new system is backward compatible. Simply replace calls to the old favicon extraction:
```go
// Old way (still works but less comprehensive)
metadata.Favicon = extractFavicon(content, parsedURL)
// New way (recommended)
favicon, err := GetFavicon(targetURL)
if err == nil {
metadata.Favicon = favicon
}
```
The enhanced system provides significantly better favicon detection rates, especially for modern web applications that use non-standard favicon locations.