Devour/README.md

<p align="center">
  <img src="devour_logo.svg" alt="Devour Logo" width="300">
</p>

<h1 align="center">Devour</h1>

<p align="center">
  <strong>Context Ingestion & Management for AI</strong>
</p>

<p align="center">
  <a href="#features">Features</a> •
  <a href="#installation">Installation</a> •
  <a href="#quick-start">Quick Start</a> •
  <a href="#architecture">Architecture</a> •
  <a href="#cli-reference">CLI Reference</a> •
  <a href="#configuration">Configuration</a>
</p>

---

## What is Devour?

Devour is a **context ingestion and management system** designed to feed structured, relevant context to AI models for generating accurate, fully working code.

It scrapes, indexes, and serves documentation from multiple sources:
- GitHub repositories
- OpenAPI/Swagger specifications
- Markdown/HTML documentation sites
- JSON/YAML schemas
- Local project files

### Two Modes of Operation

| Mode | Description | Use Case |
|------|-------------|----------|
| **Local** | Runs as an OpenCode skill on your machine | Single developer, offline work |
| **Remote** | MCP server hosted on your infrastructure | Teams, multi-project support |

---

## Features

### 🕷️ Multi-Source Scraping
- **GitHub** - Clone and parse repos, extract README, docs, code structure
- **OpenAPI** - Parse Swagger specs into structured endpoints
- **Web Docs** - Crawl documentation sites with Colly
- **Local Files** - Index your project's docs folder

### 🧠 Intelligent Indexing
- Vector embeddings via OpenAI (text-embedding-3-small/large)
- Semantic similarity search for context retrieval
- Metadata tracking (source, timestamp, file type)

### 🔄 Automatic Updates
- Configurable scheduler (default: every 3 days)
- Content hash comparison for change detection
- Automatic re-indexing on updates

### 🔌 MCP Integration
- Exposes context via Model Context Protocol
- **Local mode**: stdio transport (OpenCode skill)
- **Remote mode**: HTTP/SSE transport (MCP server)

### 💾 Flexible Storage
```
devour_data/
├── docs/           # Raw scraped documents
├── index/          # Vector embeddings
└── metadata/       # Source tracking & timestamps
```

### 📊 Quality Scorecard

![Quality Scorecard](scorecard.png)

Devour includes a built-in code quality analysis system that generates a comprehensive scorecard for your project.

```bash
# Run quality analysis
devour quality scan

# Generate a visual scorecard badge
devour quality scan --badge-path scorecard.png
```

**Features:**
- Multi-language support (Go, Python, JavaScript, etc.)
- Severity-based scoring (T1-T4 tiers)
- Technical debt tracking
- Automated code review integration

---

## Installation

### Prerequisites
- Go 1.22+
- OpenAI API key (for embeddings)

### From Source

```bash
# Clone the repository
git clone https://github.com/yourorg/devour.git
cd devour

# Install dependencies
go mod download

# Build
go build -o devour ./cmd/devour

# Install globally
go install ./cmd/devour
```

### Quick Install

```bash
go install github.com/yourorg/devour/cmd/devour@latest
```

---

## Quick Start

### 1. Initialize a Project

```bash
# Create devour config in current directory
devour init

# Or specify a path
devour init ./my-project
```

### 2. Get Documentation (NEW!)

```bash
# Quick access to popular language/framework docs
devour get go http              # Go HTTP package
devour get python asyncio      # Python asyncio module
devour get react hooks         # React Hooks documentation
devour get docker compose      # Docker Compose docs
devour get rust tokio          # Rust Tokio crate
devour get spring boot         # Spring Boot framework

# Enhanced markdown output
devour get go http --format markdown
```

**Supported Languages:**
- `go`, `golang` - Go packages (pkg.go.dev)
- `rust` - Rust crates (docs.rs)
- `python`, `py` - Python modules (docs.python.org)
- `java` - Java packages (docs.oracle.com)
- `spring` - Spring Boot (docs.spring.io)
- `typescript`, `ts` - TypeScript (typescriptlang.org)
- `react` - React (react.dev)
- `vue` - Vue.js (vuejs.org)
- `nuxt` - Nuxt (nuxt.com)
- `docker` - Docker (docs.docker.com)
- `cloudflare`, `cf` - Cloudflare (developers.cloudflare.com)
- `astro` - Astro (docs.astro.build)

### 3. Scrape Documentation

```bash
# Scrape from a URL
devour scrape https://docs.example.com

# Scrape a GitHub repo
devour scrape https://github.com/org/repo

# Scrape local docs
devour scrape ./docs

# Multiple sources
devour scrape --sources sources.yaml
```

### 4. Query Context

```bash
# Search indexed docs
devour query "How do I authenticate with the API?"

# With options
devour query "authentication" --limit 5 --format json
```

### 5. Start the Server

```bash
# Local MCP server (stdio transport)
devour serve

# Remote MCP server (HTTP)
devour serve --remote --port 8080
```

### 6. Check Status

```bash
devour status
```

---

## Enhanced Features

### 🎯 Simplified Language Interface

The new `devour get` command provides instant access to documentation for popular languages and frameworks without needing to remember full URLs:

```bash
# Instead of: devour scrape https://pkg.go.dev/net/http
devour get go http

# Instead of: devour scrape https://react.dev/reference/react/hooks
devour get react hooks

# Instead of: devour scrape https://docs.docker.com/compose
devour get docker compose
```

### 📝 Rich Markdown Output

Enable enhanced markdown formatting for beautiful, structured documentation:

```bash
devour get go http --format markdown
```

**Features:**
- 📋 Document metadata tables
- 📑 Auto-generated table of contents
- 🎨 Enhanced typography with emoji indicators
- 🔗 Automatic link conversion
- 📚 Structured content sections
- 🏷️ Source attribution and timestamps

### 🧠 Smart Content Enhancement

The markdown formatter automatically:
- Converts plain URLs to clickable links
- Adds visual indicators for examples, notes, and warnings
- Fixes code block formatting
- Generates proper heading structure
- Creates document metadata tables

---

## Architecture

```
┌─────────────────────────────────────────────────────────────────┐
│                         Devour System                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌─────────┐    ┌──────────┐    ┌───────────┐    ┌──────────┐  │
│  │ Scraper │───▶│ Indexer  │───▶│  Storage  │───▶│  Server  │  │
│  └─────────┘    └──────────┘    └───────────┘    └──────────┘  │
│       │              │               │                │        │
│       ▼              ▼               ▼                ▼        │
│  ┌─────────┐    ┌──────────┐    ┌───────────┐    ┌──────────┐  │
│  │ GitHub  │    │ OpenAI   │    │ Vector DB │    │   MCP    │  │
│  │ Web     │    │ Embeds   │    │ (chromem) │    │ Protocol │  │
│  │ Local   │    │          │    │           │    │          │  │
│  └─────────┘    └──────────┘    └───────────┘    └──────────┘  │
│                                                                 │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                     Scheduler                            │   │
│  │         (Auto-update every 3 days, configurable)         │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
```

### Data Flow

```
User Query → Devour Server → Embedding Generation → Vector Search
                                                           │
                                                           ▼
    AI Response ← Context Chunks ← Top-K Relevant Docs ←───┘
```

---

## CLI Reference

### Commands

| Command | Description |
|---------|-------------|
| `devour init [path]` | Initialize Devour for a project |
| `devour get <language> <keyword>` | **NEW** Quick docs fetch for popular languages |
| `devour scrape <source>` | Scrape docs from URL, repo, or path |
| `devour serve` | Start MCP server (local or remote) |
| `devour query <text>` | Search indexed documentation |
| `devour status` | Show index stats and last update |
| `devour sync` | Fetch updates from all sources |
| `devour push <path>` | Push docs to remote MCP server |

### Flags

```bash
# Global flags
--config, -c     Config file path (default: ./devour.yaml)
--verbose, -v    Enable verbose logging
--quiet, -q      Suppress non-error output

# scrape flags
--sources, -s    YAML file with source definitions
--format, -f     Output format: json, markdown (default: json)
--concurrency    Parallel scraping workers (default: 10)

# serve flags
--remote         Run as remote HTTP server
--port, -p       HTTP port (default: 8080)
--host           HTTP host (default: localhost)

# query flags
--limit, -l      Max results (default: 5)
--format, -f     Output: json, text, markdown
--threshold      Similarity threshold (default: 0.7)
```

---

## Configuration

### devour.yaml

```yaml
# Devour Configuration

# Storage paths
storage:
  docs_dir: ./devour_data/docs
  index_dir: ./devour_data/index
  metadata_dir: ./devour_data/metadata

# Embedding settings
embeddings:
  provider: openai           # openai, local
  model: text-embedding-3-small
  api_key: ${OPENAI_API_KEY} # Env var reference

# Vector database
vector_db:
  type: chromem              # chromem, weaviate, faiss
  persist: true

# Scraping settings
scraper:
  user_agent: "Devour/1.0"
  timeout: 30s
  retry_count: 3
  concurrency: 10
  rate_limit: 500ms

# Scheduler
scheduler:
  enabled: true
  interval: 72h              # Every 3 days
  check_method: hash         # hash, timestamp

# Server settings
server:
  mode: local                # local, remote
  port: 8080
  host: localhost

# Sources (for sync)
sources:
  - name: project-docs
    type: url
    url: https://docs.example.com
    include: ["**/*.md", "**/*.html"]
    exclude: ["**/api/**"]

  - name: api-spec
    type: openapi
    url: https://api.example.com/openapi.json

  - name: github-repo
    type: github
    repo: org/repo
    branch: main
    paths: ["docs/", "README.md"]
```

---

## API Reference

### MCP Tools (when running as server)

#### `devour_query`
Search indexed documentation for relevant context.

```json
{
  "query": "How do I authenticate?",
  "limit": 5,
  "threshold": 0.7
}
```

#### `devour_add`
Add documents to the index.

```json
{
  "documents": [
    {
      "content": "Document text...",
      "metadata": {
        "source": "https://...",
        "type": "markdown"
      }
    }
  ]
}
```

#### `devour_status`
Get indexing status and statistics.

### REST API (remote mode)

```
GET  /health              # Health check
GET  /status              # Index statistics
POST /query               # Search documents
POST /documents           # Add documents
GET  /documents           # List documents
DELETE /documents/:id     # Delete document
POST /sync                # Trigger sync
```

---

## Integration Examples

### With OpenCode (Local Mode)

Add to your OpenCode skills:

```yaml
# ~/.opencode/skills.yaml
skills:
  - name: devour
    path: /path/to/devour
    commands:
      - devour serve
```

Then in OpenCode:
```
/devour query "authentication flow"
```

### With AI Applications

```go
import "github.com/yourorg/devour/pkg/client"

func main() {
    client := client.New("http://localhost:8080")

    results, err := client.Query(ctx, "How do I use the API?", 5)
    if err != nil {
        log.Fatal(err)
    }

    for _, r := range results {
        fmt.Printf("Score: %.2f - %s\n", r.Score, r.Content[:100])
    }
}
```

---

## Development

### Project Structure

```
devour/
├── cmd/devour/           # CLI entrypoint
│   └── main.go
├── internal/
│   ├── scraper/          # Scraping logic
│   ├── indexer/          # Embedding generation
│   ├── server/           # MCP server
│   ├── scheduler/        # Background updates
│   └── ai/               # AI integrations
├── pkg/
│   ├── client/           # Go client library
│   └── types/            # Shared types
├── devour_data/          # Default data directory
├── go.mod
├── Makefile
└── README.md
```

### Building

```bash
# Development build
go build -o devour ./cmd/devour

# Production build
CGO_ENABLED=0 go build -ldflags="-s -w" -o devour ./cmd/devour

# Run tests
go test ./...

# Run with coverage
go test -cover ./...
```

### Makefile Targets

```bash
make build        # Build binary
make test         # Run tests
make lint         # Run linter
make docker       # Build Docker image
make install      # Install locally
```

---

## Roadmap

- [ ] Local LLM support (Ollama, LocalAI)
- [ ] Multi-tenant support for remote mode
- [ ] Web UI for document management
- [ ] Git-based versioning for docs
- [ ] Plugin system for custom scrapers
- [ ] Reranking with cross-encoders

---

## Contributing

Contributions are welcome! Please read our [Contributing Guide](CONTRIBUTING.md) for details.

---

## License

MIT License - see [LICENSE](LICENSE) for details.

---

<p align="center">
  <sub>Built with ❤️ for better AI context</sub>
</p>