Files
Trackeep/docs/FAVICON_FETCHING.md
T
Tomas Dvorak b083dac3f0 feat(frontend): enhance API credentials system and build configuration
Add real API support in demo mode with credential checking, implement build-time version injection from package.json, and refactor update checking with 24-hour caching. Migrate landing page from Vue to Astro with comprehensive UI components including Hero, Features, Benefits, and Tech Stack sections. Update CI/CD workflow with expanded cache paths and security scanner version pinned.
2026-02-10 16:25:57 +01:00

5.4 KiB

Enhanced Favicon Fetching System

This document describes the enhanced favicon fetching system implemented to address issues with favicons not being found when they're located in non-standard paths.

Overview

The enhanced favicon fetching system (favicon_fetcher.go) provides comprehensive favicon detection by:

  1. HTML Head Parsing: Extracts favicon URLs from the <head> section of HTML pages
  2. Multiple Pattern Matching: Supports various favicon declaration formats
  3. Common Location Checking: Tests standard favicon file paths
  4. Fallback Services: Uses Google's favicon service as a reliable fallback

Key Features

Comprehensive Pattern Detection

The system detects favicons declared in multiple ways:

<!-- Standard favicon -->
<link rel="icon" href="/favicon.ico">
<link rel="shortcut icon" href="/favicon.ico">

<!-- Apple touch icons -->
<link rel="apple-touch-icon" href="/apple-touch-icon.png">
<link rel="apple-touch-icon-precomposed" href="/apple-touch-icon-precomposed.png">

<!-- Android icons -->
<link rel="android-chrome-192x192" href="/android-chrome-192x192.png">

<!-- Microsoft tiles -->
<meta name="msapplication-TileImage" content="/mstile-144x144.png">

<!-- Open Graph images (fallback) -->
<meta property="og:image" content="/logo.png">

<!-- Twitter images (fallback) -->
<meta name="twitter:image" content="/logo.png">

Common Location Testing

The system tests these common favicon paths:

  • /favicon.ico, /favicon.png, /favicon.svg
  • /apple-touch-icon.png, /apple-touch-icon-precomposed.png
  • /android-chrome-192x192.png
  • /icon.png, /icon.svg, /logo.png, /logo.svg
  • /assets/favicon.ico, /static/favicon.ico, /images/favicon.ico
  • And more...

Smart URL Resolution

The system properly handles:

  • Absolute URLs (https://cdn.example.com/favicon.ico)
  • Protocol-relative URLs (//cdn.example.com/favicon.ico)
  • Root-relative URLs (/favicon.ico)
  • Relative URLs (../favicon.ico, favicon.ico)

Usage

Basic Usage

import "github.com/trackeep/backend/services"

// Get the best available favicon
favicon, err := services.GetFavicon("https://example.com")
if err != nil {
    log.Printf("Error fetching favicon: %v", err)
}
fmt.Printf("Favicon: %s\n", favicon)

Multiple Candidates

// Get multiple favicon candidates
favicons := services.GetAllFavicons("https://example.com", 5)
for i, favicon := range favicons {
    fmt.Printf("%d. %s\n", i+1, favicon)
}

Advanced Usage

fetcher := services.NewFaviconFetcher()
favicon, err := fetcher.FetchFavicon("https://example.com")
if err != nil {
    // Handle error
}

Integration with Metadata Service

The favicon fetcher is integrated into the existing metadata service:

metadata, err := services.FetchWebsiteMetadata("https://example.com")
if err == nil {
    fmt.Printf("Favicon: %s\n", metadata.Favicon)
}

Performance Considerations

  • Timeout: 10 seconds per HTTP request
  • Caching: Consider implementing caching for frequently accessed favicons
  • Concurrency: The system is designed to be thread-safe
  • Verification: Each favicon candidate is verified with a HEAD request

Testing

CLI Testing Tool

Use the CLI tool to test favicon fetching:

cd backend
go run tools/favicon_cli.go https://github.com

Unit Tests

Run the unit tests:

cd backend
go test ./services -v -run TestFaviconFetcher

Benchmark Tests

Run benchmark tests:

cd backend
go test ./services -bench=BenchmarkFaviconFetch

Error Handling

The system gracefully handles:

  • Network timeouts and connection errors
  • Invalid URLs
  • HTTP errors (4xx, 5xx responses)
  • Malformed HTML
  • Missing favicons (falls back to Google's service)

Fallback Strategy

  1. HTML Extraction: Try to extract from HTML head
  2. Common Paths: Test standard favicon locations
  3. Google Service: Use https://www.google.com/s2/favicons?domain=example.com&sz=128

Configuration

The system uses these default settings:

  • HTTP Timeout: 10 seconds
  • User Agent: Chrome browser string
  • Max Results: 5 (for multiple favicon fetching)
  • Google Favicon Size: 128px

Troubleshooting

Common Issues

  1. Rate Limiting: Some sites may block frequent requests
  2. HTTPS Issues: Certificate validation problems
  3. Redirects: The system follows redirects automatically
  4. Large Pages: Only the head section is parsed for efficiency

Debug Mode

Enable debug logging by modifying the log level in the fetcher:

fetcher := services.NewFaviconFetcher()
// Add debug logging as needed

Future Improvements

Potential enhancements:

  1. Persistent Caching: Implement Redis or database caching
  2. Async Fetching: Background favicon updates
  3. Image Processing: Size optimization and format conversion
  4. Domain Whitelisting: Prioritize certain domains
  5. Machine Learning: Predict best favicon based on site structure

Migration from Old System

The new system is backward compatible. Simply replace calls to the old favicon extraction:

// Old way (still works but less comprehensive)
metadata.Favicon = extractFavicon(content, parsedURL)

// New way (recommended)
favicon, err := GetFavicon(targetURL)
if err == nil {
    metadata.Favicon = favicon
}

The enhanced system provides significantly better favicon detection rates, especially for modern web applications that use non-standard favicon locations.