Files
Devour/README.md
Tomas Dvorak 55885a0e8f first commit
2026-02-22 10:42:17 +01:00

565 lines
14 KiB
Markdown

<p align="center">
<img src="devour_logo.svg" alt="Devour Logo" width="300">
</p>
<h1 align="center">Devour</h1>
<p align="center">
<strong>Context Ingestion & Management for AI</strong>
</p>
<p align="center">
<a href="#features">Features</a> •
<a href="#installation">Installation</a> •
<a href="#quick-start">Quick Start</a> •
<a href="#architecture">Architecture</a> •
<a href="#cli-reference">CLI Reference</a> •
<a href="#configuration">Configuration</a>
</p>
---
## What is Devour?
Devour is a **context ingestion and management system** designed to feed structured, relevant context to AI models for generating accurate, fully working code.
It scrapes, indexes, and serves documentation from multiple sources:
- GitHub repositories
- OpenAPI/Swagger specifications
- Markdown/HTML documentation sites
- JSON/YAML schemas
- Local project files
### Two Modes of Operation
| Mode | Description | Use Case |
|------|-------------|----------|
| **Local** | Runs as an OpenCode skill on your machine | Single developer, offline work |
| **Remote** | MCP server hosted on your infrastructure | Teams, multi-project support |
---
## Features
### 🕷️ Multi-Source Scraping
- **GitHub** - Clone and parse repos, extract README, docs, code structure
- **OpenAPI** - Parse Swagger specs into structured endpoints
- **Web Docs** - Crawl documentation sites with Colly
- **Local Files** - Index your project's docs folder
### 🧠 Intelligent Indexing
- Vector embeddings via OpenAI (text-embedding-3-small/large)
- Semantic similarity search for context retrieval
- Metadata tracking (source, timestamp, file type)
### 🔄 Automatic Updates
- Configurable scheduler (default: every 3 days)
- Content hash comparison for change detection
- Automatic re-indexing on updates
### 🔌 MCP Integration
- Exposes context via Model Context Protocol
- **Local mode**: stdio transport (OpenCode skill)
- **Remote mode**: HTTP/SSE transport (MCP server)
### 💾 Flexible Storage
```
devour_data/
├── docs/ # Raw scraped documents
├── index/ # Vector embeddings
└── metadata/ # Source tracking & timestamps
```
### 📊 Quality Scorecard
![Quality Scorecard](scorecard.png)
Devour includes a built-in code quality analysis system that generates a comprehensive scorecard for your project.
```bash
# Run quality analysis
devour quality scan
# Generate a visual scorecard badge
devour quality scan --badge-path scorecard.png
```
**Features:**
- Multi-language support (Go, Python, JavaScript, etc.)
- Severity-based scoring (T1-T4 tiers)
- Technical debt tracking
- Automated code review integration
---
## Installation
### Prerequisites
- Go 1.22+
- OpenAI API key (for embeddings)
### From Source
```bash
# Clone the repository
git clone https://github.com/yourorg/devour.git
cd devour
# Install dependencies
go mod download
# Build
go build -o devour ./cmd/devour
# Install globally
go install ./cmd/devour
```
### Quick Install
```bash
go install github.com/yourorg/devour/cmd/devour@latest
```
---
## Quick Start
### 1. Initialize a Project
```bash
# Create devour config in current directory
devour init
# Or specify a path
devour init ./my-project
```
### 2. Get Documentation (NEW!)
```bash
# Quick access to popular language/framework docs
devour get go http # Go HTTP package
devour get python asyncio # Python asyncio module
devour get react hooks # React Hooks documentation
devour get docker compose # Docker Compose docs
devour get rust tokio # Rust Tokio crate
devour get spring boot # Spring Boot framework
# Enhanced markdown output
devour get go http --format markdown
```
**Supported Languages:**
- `go`, `golang` - Go packages (pkg.go.dev)
- `rust` - Rust crates (docs.rs)
- `python`, `py` - Python modules (docs.python.org)
- `java` - Java packages (docs.oracle.com)
- `spring` - Spring Boot (docs.spring.io)
- `typescript`, `ts` - TypeScript (typescriptlang.org)
- `react` - React (react.dev)
- `vue` - Vue.js (vuejs.org)
- `nuxt` - Nuxt (nuxt.com)
- `docker` - Docker (docs.docker.com)
- `cloudflare`, `cf` - Cloudflare (developers.cloudflare.com)
- `astro` - Astro (docs.astro.build)
### 3. Scrape Documentation
```bash
# Scrape from a URL
devour scrape https://docs.example.com
# Scrape a GitHub repo
devour scrape https://github.com/org/repo
# Scrape local docs
devour scrape ./docs
# Multiple sources
devour scrape --sources sources.yaml
```
### 4. Query Context
```bash
# Search indexed docs
devour query "How do I authenticate with the API?"
# With options
devour query "authentication" --limit 5 --format json
```
### 5. Start the Server
```bash
# Local MCP server (stdio transport)
devour serve
# Remote MCP server (HTTP)
devour serve --remote --port 8080
```
### 6. Check Status
```bash
devour status
```
---
## Enhanced Features
### 🎯 Simplified Language Interface
The new `devour get` command provides instant access to documentation for popular languages and frameworks without needing to remember full URLs:
```bash
# Instead of: devour scrape https://pkg.go.dev/net/http
devour get go http
# Instead of: devour scrape https://react.dev/reference/react/hooks
devour get react hooks
# Instead of: devour scrape https://docs.docker.com/compose
devour get docker compose
```
### 📝 Rich Markdown Output
Enable enhanced markdown formatting for beautiful, structured documentation:
```bash
devour get go http --format markdown
```
**Features:**
- 📋 Document metadata tables
- 📑 Auto-generated table of contents
- 🎨 Enhanced typography with emoji indicators
- 🔗 Automatic link conversion
- 📚 Structured content sections
- 🏷️ Source attribution and timestamps
### 🧠 Smart Content Enhancement
The markdown formatter automatically:
- Converts plain URLs to clickable links
- Adds visual indicators for examples, notes, and warnings
- Fixes code block formatting
- Generates proper heading structure
- Creates document metadata tables
---
## Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ Devour System │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────┐ ┌──────────┐ ┌───────────┐ ┌──────────┐ │
│ │ Scraper │───▶│ Indexer │───▶│ Storage │───▶│ Server │ │
│ └─────────┘ └──────────┘ └───────────┘ └──────────┘ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌─────────┐ ┌──────────┐ ┌───────────┐ ┌──────────┐ │
│ │ GitHub │ │ OpenAI │ │ Vector DB │ │ MCP │ │
│ │ Web │ │ Embeds │ │ (chromem) │ │ Protocol │ │
│ │ Local │ │ │ │ │ │ │ │
│ └─────────┘ └──────────┘ └───────────┘ └──────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Scheduler │ │
│ │ (Auto-update every 3 days, configurable) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
```
### Data Flow
```
User Query → Devour Server → Embedding Generation → Vector Search
AI Response ← Context Chunks ← Top-K Relevant Docs ←───┘
```
---
## CLI Reference
### Commands
| Command | Description |
|---------|-------------|
| `devour init [path]` | Initialize Devour for a project |
| `devour get <language> <keyword>` | **NEW** Quick docs fetch for popular languages |
| `devour scrape <source>` | Scrape docs from URL, repo, or path |
| `devour serve` | Start MCP server (local or remote) |
| `devour query <text>` | Search indexed documentation |
| `devour status` | Show index stats and last update |
| `devour sync` | Fetch updates from all sources |
| `devour push <path>` | Push docs to remote MCP server |
### Flags
```bash
# Global flags
--config, -c Config file path (default: ./devour.yaml)
--verbose, -v Enable verbose logging
--quiet, -q Suppress non-error output
# scrape flags
--sources, -s YAML file with source definitions
--format, -f Output format: json, markdown (default: json)
--concurrency Parallel scraping workers (default: 10)
# serve flags
--remote Run as remote HTTP server
--port, -p HTTP port (default: 8080)
--host HTTP host (default: localhost)
# query flags
--limit, -l Max results (default: 5)
--format, -f Output: json, text, markdown
--threshold Similarity threshold (default: 0.7)
```
---
## Configuration
### devour.yaml
```yaml
# Devour Configuration
# Storage paths
storage:
docs_dir: ./devour_data/docs
index_dir: ./devour_data/index
metadata_dir: ./devour_data/metadata
# Embedding settings
embeddings:
provider: openai # openai, local
model: text-embedding-3-small
api_key: ${OPENAI_API_KEY} # Env var reference
# Vector database
vector_db:
type: chromem # chromem, weaviate, faiss
persist: true
# Scraping settings
scraper:
user_agent: "Devour/1.0"
timeout: 30s
retry_count: 3
concurrency: 10
rate_limit: 500ms
# Scheduler
scheduler:
enabled: true
interval: 72h # Every 3 days
check_method: hash # hash, timestamp
# Server settings
server:
mode: local # local, remote
port: 8080
host: localhost
# Sources (for sync)
sources:
- name: project-docs
type: url
url: https://docs.example.com
include: ["**/*.md", "**/*.html"]
exclude: ["**/api/**"]
- name: api-spec
type: openapi
url: https://api.example.com/openapi.json
- name: github-repo
type: github
repo: org/repo
branch: main
paths: ["docs/", "README.md"]
```
---
## API Reference
### MCP Tools (when running as server)
#### `devour_query`
Search indexed documentation for relevant context.
```json
{
"query": "How do I authenticate?",
"limit": 5,
"threshold": 0.7
}
```
#### `devour_add`
Add documents to the index.
```json
{
"documents": [
{
"content": "Document text...",
"metadata": {
"source": "https://...",
"type": "markdown"
}
}
]
}
```
#### `devour_status`
Get indexing status and statistics.
### REST API (remote mode)
```
GET /health # Health check
GET /status # Index statistics
POST /query # Search documents
POST /documents # Add documents
GET /documents # List documents
DELETE /documents/:id # Delete document
POST /sync # Trigger sync
```
---
## Integration Examples
### With OpenCode (Local Mode)
Add to your OpenCode skills:
```yaml
# ~/.opencode/skills.yaml
skills:
- name: devour
path: /path/to/devour
commands:
- devour serve
```
Then in OpenCode:
```
/devour query "authentication flow"
```
### With AI Applications
```go
import "github.com/yourorg/devour/pkg/client"
func main() {
client := client.New("http://localhost:8080")
results, err := client.Query(ctx, "How do I use the API?", 5)
if err != nil {
log.Fatal(err)
}
for _, r := range results {
fmt.Printf("Score: %.2f - %s\n", r.Score, r.Content[:100])
}
}
```
---
## Development
### Project Structure
```
devour/
├── cmd/devour/ # CLI entrypoint
│ └── main.go
├── internal/
│ ├── scraper/ # Scraping logic
│ ├── indexer/ # Embedding generation
│ ├── server/ # MCP server
│ ├── scheduler/ # Background updates
│ └── ai/ # AI integrations
├── pkg/
│ ├── client/ # Go client library
│ └── types/ # Shared types
├── devour_data/ # Default data directory
├── go.mod
├── Makefile
└── README.md
```
### Building
```bash
# Development build
go build -o devour ./cmd/devour
# Production build
CGO_ENABLED=0 go build -ldflags="-s -w" -o devour ./cmd/devour
# Run tests
go test ./...
# Run with coverage
go test -cover ./...
```
### Makefile Targets
```bash
make build # Build binary
make test # Run tests
make lint # Run linter
make docker # Build Docker image
make install # Install locally
```
---
## Roadmap
- [ ] Local LLM support (Ollama, LocalAI)
- [ ] Multi-tenant support for remote mode
- [ ] Web UI for document management
- [ ] Git-based versioning for docs
- [ ] Plugin system for custom scrapers
- [ ] Reranking with cross-encoders
---
## Contributing
Contributions are welcome! Please read our [Contributing Guide](CONTRIBUTING.md) for details.
---
## License
MIT License - see [LICENSE](LICENSE) for details.
---
<p align="center">
<sub>Built with ❤️ for better AI context</sub>
</p>