TDvorak/Devour

Fork 0

mirror of https://github.com/Dvorinka/Devour.git synced 2026-06-03 20:13:03 +00:00

Files

T

Tomas Dvorak 55885a0e8f first commit

2026-02-22 10:42:17 +01:00

14 KiB

Raw Permalink Blame History

Devour

Context Ingestion & Management for AI

Features • Installation • Quick Start • Architecture • CLI Reference • Configuration

What is Devour?

Devour is a context ingestion and management system designed to feed structured, relevant context to AI models for generating accurate, fully working code.

It scrapes, indexes, and serves documentation from multiple sources:

GitHub repositories
OpenAPI/Swagger specifications
Markdown/HTML documentation sites
JSON/YAML schemas
Local project files

Two Modes of Operation

Mode	Description	Use Case
Local	Runs as an OpenCode skill on your machine	Single developer, offline work
Remote	MCP server hosted on your infrastructure	Teams, multi-project support

Features

🕷️ Multi-Source Scraping

GitHub - Clone and parse repos, extract README, docs, code structure
OpenAPI - Parse Swagger specs into structured endpoints
Web Docs - Crawl documentation sites with Colly
Local Files - Index your project's docs folder

🧠 Intelligent Indexing

Vector embeddings via OpenAI (text-embedding-3-small/large)
Semantic similarity search for context retrieval
Metadata tracking (source, timestamp, file type)

🔄 Automatic Updates

Configurable scheduler (default: every 3 days)
Content hash comparison for change detection
Automatic re-indexing on updates

🔌 MCP Integration

Exposes context via Model Context Protocol
Local mode: stdio transport (OpenCode skill)
Remote mode: HTTP/SSE transport (MCP server)

💾 Flexible Storage

devour_data/
├── docs/           # Raw scraped documents
├── index/          # Vector embeddings
└── metadata/       # Source tracking & timestamps

📊 Quality Scorecard

Devour includes a built-in code quality analysis system that generates a comprehensive scorecard for your project.

# Run quality analysis
devour quality scan

# Generate a visual scorecard badge
devour quality scan --badge-path scorecard.png

Features:

Multi-language support (Go, Python, JavaScript, etc.)
Severity-based scoring (T1-T4 tiers)
Technical debt tracking
Automated code review integration

Installation

Prerequisites

Go 1.22+
OpenAI API key (for embeddings)

From Source

# Clone the repository
git clone https://github.com/yourorg/devour.git
cd devour

# Install dependencies
go mod download

# Build
go build -o devour ./cmd/devour

# Install globally
go install ./cmd/devour

Quick Install

go install github.com/yourorg/devour/cmd/devour@latest

Quick Start

1. Initialize a Project

# Create devour config in current directory
devour init

# Or specify a path
devour init ./my-project

2. Get Documentation (NEW!)

# Quick access to popular language/framework docs
devour get go http              # Go HTTP package
devour get python asyncio      # Python asyncio module  
devour get react hooks         # React Hooks documentation
devour get docker compose      # Docker Compose docs
devour get rust tokio          # Rust Tokio crate
devour get spring boot         # Spring Boot framework

# Enhanced markdown output
devour get go http --format markdown

Supported Languages:

go, golang - Go packages (pkg.go.dev)
rust - Rust crates (docs.rs)
python, py - Python modules (docs.python.org)
java - Java packages (docs.oracle.com)
spring - Spring Boot (docs.spring.io)
typescript, ts - TypeScript (typescriptlang.org)
react - React (react.dev)
vue - Vue.js (vuejs.org)
nuxt - Nuxt (nuxt.com)
docker - Docker (docs.docker.com)
cloudflare, cf - Cloudflare (developers.cloudflare.com)
astro - Astro (docs.astro.build)

3. Scrape Documentation

# Scrape from a URL
devour scrape https://docs.example.com

# Scrape a GitHub repo
devour scrape https://github.com/org/repo

# Scrape local docs
devour scrape ./docs

# Multiple sources
devour scrape --sources sources.yaml

4. Query Context

# Search indexed docs
devour query "How do I authenticate with the API?"

# With options
devour query "authentication" --limit 5 --format json

5. Start the Server

# Local MCP server (stdio transport)
devour serve

# Remote MCP server (HTTP)
devour serve --remote --port 8080

6. Check Status

devour status

Enhanced Features

🎯 Simplified Language Interface

The new devour get command provides instant access to documentation for popular languages and frameworks without needing to remember full URLs:

# Instead of: devour scrape https://pkg.go.dev/net/http
devour get go http

# Instead of: devour scrape https://react.dev/reference/react/hooks  
devour get react hooks

# Instead of: devour scrape https://docs.docker.com/compose
devour get docker compose

📝 Rich Markdown Output

Enable enhanced markdown formatting for beautiful, structured documentation:

devour get go http --format markdown

Features:

📋 Document metadata tables
📑 Auto-generated table of contents
🎨 Enhanced typography with emoji indicators
🔗 Automatic link conversion
📚 Structured content sections
🏷️ Source attribution and timestamps

🧠 Smart Content Enhancement

The markdown formatter automatically:

Converts plain URLs to clickable links
Adds visual indicators for examples, notes, and warnings
Fixes code block formatting
Generates proper heading structure
Creates document metadata tables

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         Devour System                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌─────────┐    ┌──────────┐    ┌───────────┐    ┌──────────┐  │
│  │ Scraper │───▶│ Indexer  │───▶│  Storage  │───▶│  Server  │  │
│  └─────────┘    └──────────┘    └───────────┘    └──────────┘  │
│       │              │               │                │        │
│       ▼              ▼               ▼                ▼        │
│  ┌─────────┐    ┌──────────┐    ┌───────────┐    ┌──────────┐  │
│  │ GitHub  │    │ OpenAI   │    │ Vector DB │    │   MCP    │  │
│  │ Web     │    │ Embeds   │    │ (chromem) │    │ Protocol │  │
│  │ Local   │    │          │    │           │    │          │  │
│  └─────────┘    └──────────┘    └───────────┘    └──────────┘  │
│                                                                 │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                     Scheduler                            │   │
│  │         (Auto-update every 3 days, configurable)         │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Data Flow

User Query → Devour Server → Embedding Generation → Vector Search
                                                           │
                                                           ▼
    AI Response ← Context Chunks ← Top-K Relevant Docs ←───┘

CLI Reference

Commands

Command	Description
`devour init [path]`	Initialize Devour for a project
`devour get <language> <keyword>`	NEW Quick docs fetch for popular languages
`devour scrape <source>`	Scrape docs from URL, repo, or path
`devour serve`	Start MCP server (local or remote)
`devour query <text>`	Search indexed documentation
`devour status`	Show index stats and last update
`devour sync`	Fetch updates from all sources
`devour push <path>`	Push docs to remote MCP server

Flags

# Global flags
--config, -c     Config file path (default: ./devour.yaml)
--verbose, -v    Enable verbose logging
--quiet, -q      Suppress non-error output

# scrape flags
--sources, -s    YAML file with source definitions
--format, -f     Output format: json, markdown (default: json)
--concurrency    Parallel scraping workers (default: 10)

# serve flags
--remote         Run as remote HTTP server
--port, -p       HTTP port (default: 8080)
--host           HTTP host (default: localhost)

# query flags
--limit, -l      Max results (default: 5)
--format, -f     Output: json, text, markdown
--threshold      Similarity threshold (default: 0.7)

Configuration

devour.yaml

# Devour Configuration

# Storage paths
storage:
  docs_dir: ./devour_data/docs
  index_dir: ./devour_data/index
  metadata_dir: ./devour_data/metadata

# Embedding settings
embeddings:
  provider: openai           # openai, local
  model: text-embedding-3-small
  api_key: ${OPENAI_API_KEY} # Env var reference

# Vector database
vector_db:
  type: chromem              # chromem, weaviate, faiss
  persist: true

# Scraping settings
scraper:
  user_agent: "Devour/1.0"
  timeout: 30s
  retry_count: 3
  concurrency: 10
  rate_limit: 500ms

# Scheduler
scheduler:
  enabled: true
  interval: 72h              # Every 3 days
  check_method: hash         # hash, timestamp

# Server settings
server:
  mode: local                # local, remote
  port: 8080
  host: localhost

# Sources (for sync)
sources:
  - name: project-docs
    type: url
    url: https://docs.example.com
    include: ["**/*.md", "**/*.html"]
    exclude: ["**/api/**"]
    
  - name: api-spec
    type: openapi
    url: https://api.example.com/openapi.json
    
  - name: github-repo
    type: github
    repo: org/repo
    branch: main
    paths: ["docs/", "README.md"]

API Reference

MCP Tools (when running as server)

`devour_query`

Search indexed documentation for relevant context.

{
  "query": "How do I authenticate?",
  "limit": 5,
  "threshold": 0.7
}

`devour_add`

Add documents to the index.

{
  "documents": [
    {
      "content": "Document text...",
      "metadata": {
        "source": "https://...",
        "type": "markdown"
      }
    }
  ]
}

`devour_status`

Get indexing status and statistics.

REST API (remote mode)

GET  /health              # Health check
GET  /status              # Index statistics
POST /query               # Search documents
POST /documents           # Add documents
GET  /documents           # List documents
DELETE /documents/:id     # Delete document
POST /sync                # Trigger sync

Integration Examples

With OpenCode (Local Mode)

Add to your OpenCode skills:

# ~/.opencode/skills.yaml
skills:
  - name: devour
    path: /path/to/devour
    commands:
      - devour serve

Then in OpenCode:

/devour query "authentication flow"

With AI Applications

import "github.com/yourorg/devour/pkg/client"

func main() {
    client := client.New("http://localhost:8080")
    
    results, err := client.Query(ctx, "How do I use the API?", 5)
    if err != nil {
        log.Fatal(err)
    }
    
    for _, r := range results {
        fmt.Printf("Score: %.2f - %s\n", r.Score, r.Content[:100])
    }
}

Development

Project Structure

devour/
├── cmd/devour/           # CLI entrypoint
│   └── main.go
├── internal/
│   ├── scraper/          # Scraping logic
│   ├── indexer/          # Embedding generation
│   ├── server/           # MCP server
│   ├── scheduler/        # Background updates
│   └── ai/               # AI integrations
├── pkg/
│   ├── client/           # Go client library
│   └── types/            # Shared types
├── devour_data/          # Default data directory
├── go.mod
├── Makefile
└── README.md

Building

# Development build
go build -o devour ./cmd/devour

# Production build
CGO_ENABLED=0 go build -ldflags="-s -w" -o devour ./cmd/devour

# Run tests
go test ./...

# Run with coverage
go test -cover ./...

Makefile Targets

make build        # Build binary
make test         # Run tests
make lint         # Run linter
make docker       # Build Docker image
make install      # Install locally

Roadmap

Local LLM support (Ollama, LocalAI)
Multi-tenant support for remote mode
Web UI for document management
Git-based versioning for docs
Plugin system for custom scrapers
Reranking with cross-encoders

Contributing

Contributions are welcome! Please read our Contributing Guide for details.

License

MIT License - see LICENSE for details.

_{Built with ❤️ for better AI context}

14 KiB Raw Permalink Blame History