Files
Devour/SKILL.md
T
Tomas Dvorak 409acd2e08 updage
2026-02-22 15:41:27 +01:00

656 lines
17 KiB
Markdown

---
name: devour
description: >
Context ingestion and management system for AI. Scrapes, indexes, and serves
documentation from GitHub repos, OpenAPI specs, web docs, and local files.
Provides semantic search via vector embeddings to feed relevant context to
AI models. Runs in local mode (stdio) or remote mode (HTTP MCP server).
Supports automatic updates via configurable scheduler. Integrates with
OpenAI for embeddings and LLM context injection. Triggers on: "devour",
"scrape docs", "index documentation", "context for AI", "vector search docs",
"semantic search", "ingest documentation", "documentation to AI".
allowed-tools:
- Read
- Write
- Edit
- Glob
- Grep
- Bash
- WebFetch
---
# Devour — Context Ingestion Skill
Comprehensive documentation scraping, indexing, and retrieval system for
feeding structured context to AI models. Orchestrates 5 specialized modules
and supports both local (stdio) and remote (HTTP) MCP modes.
## Quick Reference
| Command | What it does |
|---------|-------------|
| `/devour init [path]` | Initialize Devour for a project |
| `/devour get <language> <keyword>` | **NEW** Quick docs fetch for popular languages |
| `/devour scrape <source>` | Scrape docs from URL, GitHub, or local path |
| `/devour serve` | Start MCP server (local or remote) |
| `/devour query <text>` | Search indexed documentation |
| `/devour status` | Show index stats and health |
| `/devour sync` | Fetch updates from all configured sources |
| `/devour push <path>` | Push docs to remote MCP server |
| `/devour sources` | Manage documentation sources |
| `/devour quality scan [path]` | **NEW** Run code quality analysis |
| `/devour quality status` | **NEW** Show quality metrics and trends |
| `/devour quality next` | **NEW** Show next priority issue to fix |
## Orchestration Logic
When the user invokes `/devour get <language> <keyword>`:
1. **Map language to base URL**:
- `go http``https://pkg.go.dev/http`
- `python asyncio``https://docs.python.org/3/library/asyncio.html`
- `react hooks``https://react.dev/reference/react/hooks`
- `docker compose``https://docs.docker.com/compose`
2. **Auto-detect source type** based on language:
- Go → `godocs` parser
- Python → `pythondocs` parser
- React → `reactdocs` parser
- Docker → `dockerdocs` parser
3. **Execute enhanced scrape** with pre-configured parameters:
- Automatic language-specific parsing
- Enhanced markdown formatting (if requested)
- Metadata extraction and enrichment
4. **Return structured documentation**:
- Rich markdown with TOC (if `--format markdown`)
- JSON with full metadata (default)
- Ready for AI context injection
When the user invokes `/devour scrape`:
1. **Detect source type** from URL/path:
- GitHub: `github.com/org/repo` → Clone, extract docs
- OpenAPI: Ends in `.json`/`.yaml` with OpenAPI spec → Parse endpoints
- Web: HTTP/HTTPS URL → Crawl with Colly
- Local: File path → Scan directory
2. **Scrape with appropriate parser**:
- Extract content (markdown, HTML, code structure)
- Clean and normalize text
- Extract metadata (title, headings, code blocks)
3. **Generate embeddings**:
- Chunk content appropriately (512-1024 tokens)
- Call OpenAI embedding API
- Store in vector database
4. **Update metadata**:
- Track source, timestamp, content hash
- Enable future update detection
When the user invokes `/devour query`:
1. Generate embedding for query text
2. Perform vector similarity search
3. Return top-K results with metadata
4. Optionally inject into AI context
## Enhanced Features
### 🎯 Language-Aware Documentation Access
The `devour get` command provides intelligent, language-specific documentation retrieval:
**Supported Languages & Mappings:**
- `go`, `golang` → Go packages (pkg.go.dev)
- `rust` → Rust crates (docs.rs)
- `python`, `py` → Python modules (docs.python.org)
- `java` → Java packages (docs.oracle.com)
- `spring` → Spring Boot (docs.spring.io)
- `typescript`, `ts` → TypeScript (typescriptlang.org)
- `react` → React (react.dev)
- `vue` → Vue.js (vuejs.org)
- `nuxt` → Nuxt (nuxt.com)
- `docker` → Docker (docs.docker.com)
- `cloudflare`, `cf` → Cloudflare (developers.cloudflare.com)
- `astro` → Astro (docs.astro.build)
**Usage Examples:**
```bash
/devour get go http # Go HTTP package docs
/devour get python asyncio # Python asyncio module
/devour get react hooks # React Hooks reference
/devour get docker compose # Docker Compose guide
/devour get rust tokio # Rust Tokio crate docs
```
### 📝 Rich Markdown Enhancement
When using `--format markdown`, Devour automatically enhances documentation:
**Auto-Generated Structure:**
- 📋 Document metadata tables (source, type, timestamp)
- 📑 Table of contents from headings
- 🎨 Visual indicators for important content
- 🔗 Automatic URL-to-link conversion
- 📚 Proper heading hierarchy
**Content Enhancement:**
- `Example:` → 💡 **Example:**
- `Note:` → 📝 **Note:**
- `Warning:` → ⚠️ **Warning:**
- `Important:` → ❗ **Important:**
- `TODO:` → 📋 **TODO:**
**Example Output Structure:**
```markdown
# Package Name
## 📋 Document Information
| Property | Value |
|----------|-------|
| **Source** | https://pkg.go.dev/http |
| **Type** | `godocs` |
| **Scraped** | 2026-02-19 12:30:00 |
## 📑 Table of Contents
- [Functions](#functions)
- [Types](#types)
- [Examples](#examples)
## 📚 Content
# Functions
💡 **Example:** Usage example here...
```
## Source Type Detection
| Pattern | Type | Parser |
|---------|------|--------|
| `github.com/*/*` | GitHub | Git clone + markdown parser |
| `*.json` + OpenAPI keys | OpenAPI | Swagger parser |
| `http://*`, `https://*` | Web | Colly crawler |
| `./path`, `/path` | Local | Directory scanner |
| `*.md`, `*.rst`, `*.txt` | File | Direct parse |
## Module Reference
### 1. Scraper Module (`internal/scraper`)
Responsible for fetching and parsing content from various sources.
**Supported sources:**
- GitHub repositories (clone, extract docs/, README.md)
- OpenAPI/Swagger specs (parse endpoints, schemas)
- HTML documentation sites (crawl, extract content)
- Markdown files (parse structure, code blocks)
- JSON/YAML configuration files
**Output format:**
```json
{
"id": "doc-uuid",
"source": "https://...",
"type": "markdown",
"title": "Document Title",
"content": "Extracted text...",
"metadata": {
"headings": ["H1", "H2"],
"code_blocks": ["go", "bash"],
"links": ["url1", "url2"]
},
"timestamp": "2025-01-15T10:00:00Z"
}
```
### 2. Indexer Module (`internal/indexer`)
Converts documents into vector embeddings for semantic search.
**Features:**
- OpenAI embedding integration (text-embedding-3-small/large)
- Intelligent chunking (512-1024 tokens, respect boundaries)
- Metadata preservation
- Batch processing for efficiency
**Chunking strategy:**
```go
type Chunk struct {
ID string
DocID string
Content string
Vector []float32
Metadata map[string]any
Position int // Position in original doc
}
```
### 3. Server Module (`internal/server`)
Exposes context via MCP protocol.
**Local mode (stdio):**
```
STDIN → JSON-RPC → Handler → Response → STDOUT
```
**Remote mode (HTTP):**
```
HTTP Request → Handler → Response → HTTP Response
```
**MCP Tools exposed:**
- `devour_query` - Semantic search
- `devour_add` - Add documents
- `devour_status` - Get stats
- `devour_sync` - Trigger update
**MCP Resources:**
- `devour://documents` - All indexed docs
- `devour://sources` - Configured sources
- `devour://stats` - Index statistics
### 4. Scheduler Module (`internal/scheduler`)
Manages automatic updates from configured sources.
**Default schedule:** Every 72 hours (3 days)
**Change detection methods:**
- Content hash comparison (default)
- Last-Modified timestamp
- ETag header
- Git commit hash (for repos)
**Configuration:**
```yaml
scheduler:
enabled: true
interval: 72h
check_method: hash
retry_count: 3
retry_delay: 1h
```
### 5. AI Module (`internal/ai`)
Handles AI integrations for embeddings and context injection.
**Supported providers:**
- OpenAI (primary)
- Ollama (local, planned)
- Custom endpoints
**Context injection format:**
```go
type Context struct {
Query string
Results []SearchResult
SystemPrompt string
}
func (c *Context) ToPrompt() string {
// Format for LLM consumption
}
```
## Configuration Schema
### devour.yaml
```yaml
# Core configuration
version: 1
# Storage paths
storage:
docs_dir: ./devour_data/docs
index_dir: ./devour_data/index
metadata_dir: ./devour_data/metadata
# Embedding configuration
embeddings:
provider: openai
model: text-embedding-3-small
dimensions: 1536
api_key: ${OPENAI_API_KEY}
batch_size: 100
# Vector database
vector_db:
type: chromem # chromem, weaviate, faiss
persist: true
similarity_metric: cosine
# Scraping configuration
scraper:
user_agent: "Devour/1.0 (+https://github.com/yourorg/devour)"
timeout: 30s
retry_count: 3
retry_delay: 5s
concurrency: 10
rate_limit: 500ms
max_depth: 3
cache_dir: ./devour_data/cache
# Scheduler configuration
scheduler:
enabled: true
interval: 72h
check_method: hash
on_startup: false
# Server configuration
server:
mode: local # local, remote
transport: stdio # stdio, http
host: localhost
port: 8080
cors:
enabled: false
origins: []
# Source definitions
sources:
- name: example-docs
type: url
url: https://docs.example.com
include:
- "**/*.md"
- "**/*.html"
exclude:
- "**/api/**"
- "**/legacy/**"
schedule: 24h # Override global schedule
- name: api-spec
type: openapi
url: https://api.example.com/openapi.json
schedule: 168h # Weekly
- name: internal-repo
type: github
repo: myorg/myrepo
branch: main
paths:
- docs/
- README.md
auth_token: ${GITHUB_TOKEN}
```
## Environment Variables
| Variable | Description | Default |
|----------|-------------|---------|
| `OPENAI_API_KEY` | OpenAI API key | Required |
| `DEVOUR_CONFIG` | Config file path | `./devour.yaml` |
| `DEVOUR_DATA_DIR` | Data directory | `./devour_data` |
| `GITHUB_TOKEN` | GitHub auth token | Optional |
| `DEVOUR_LOG_LEVEL` | Log level (debug, info, warn, error) | `info` |
| `DEVOUR_PORT` | Server port | `8080` |
## Code Quality Analysis
Devour includes comprehensive code quality analysis with three scorecard formats:
### Scorecard Types
**Compact Scorecard** - Quick overview with 3 circular metrics:
- Overall score (0-100%)
- Strict score (conservative metric)
- Letter grade (A-F)
**Detailed Scorecard** - Comprehensive breakdown featuring:
- Score breakdown by dimension with progress bars
- Findings grouped by type with visual charts
- Severity distribution with percentage circles
- Project metadata and timestamps
**Original Scorecard** - Balanced view with:
- Left panel: Project info and main scores
- Right panel: Dimension metrics in two-column layout
### Quality Commands
```bash
# Basic quality scan (generates original scorecard)
devour quality scan
# Generate specific scorecard formats
devour quality scan --format compact --badge-path compact.png
devour quality scan --format detailed --badge-path detailed.png
# Dark theme support
devour quality scan --theme dark --badge-path dark_scorecard.png
# Quality status and trends
devour quality status
# Show next priority issue
devour quality next
# Export findings as JSON
devour quality scan --format json > findings.json
```
### Quality Metrics
**Dimensions Analyzed:**
- Complexity - Nested loops, excessive function calls
- Duplication - Code clones and near-duplicates
- Security - Vulnerabilities and anti-patterns
- Test Coverage - Unit test coverage analysis
- Dead Code - Unused functions, variables, imports
- Coupling - High coupling between modules
- Naming - Inconsistent naming conventions
**Severity Levels:**
- **T1** - Auto-fixable (unused imports, debug logs)
- **T2** - Quick manual fixes (unused vars, dead exports)
- **T3** - Requires judgment (near-dupes, single-use abstractions)
- **T4** - Major refactor needed (god components, mixed concerns)
**Scoring:**
- **Overall Score** - General code health (0-100%)
- **Strict Score** - Conservative scoring ignoring quick wins
- **Grade** - Letter grade based on score ranges (A: 90-100%, B: 70-89%, etc.)
### Multi-Language Support
- **Go** - Full AST analysis with go/parser
- **Python** - AST analysis with ast module
- **JavaScript/TypeScript** - ESLint integration
- **Java** - JavaParser integration
- **Rust** - Synth integration (planned)
### Integration Examples
```bash
# CI/CD Pipeline Integration
devour quality scan --format json --threshold 70
if [ $? -ne 0 ]; then
echo "Quality gate failed - score below threshold"
exit 1
fi
# Generate all scorecard versions for documentation
devour quality scan --format original --badge-path docs/scorecard.png
devour quality scan --format compact --badge-path docs/scorecard_compact.png --theme light
devour quality scan --format detailed --badge-path docs/scorecard_detailed.png --theme dark
# Weekly quality tracking
devour quality scan --format json > weekly_$(date +%Y%m%d).json
```
## Quality Gates
Built-in validation rules:
- ⚠️ **WARNING** if document count < 10 (may be incomplete scrape)
- ⚠️ **WARNING** if average chunk size < 100 tokens (over-fragmented)
- 🛑 **HARD STOP** if embedding API fails (cannot index without vectors)
- 🛑 **HARD STOP** if storage is not writable (cannot persist)
## Output Formats
### Query Results (JSON)
```json
{
"query": "authentication",
"results": [
{
"id": "chunk-uuid",
"document_id": "doc-uuid",
"content": "Relevant text excerpt...",
"score": 0.89,
"source": "https://docs.example.com/auth",
"metadata": {
"title": "Authentication Guide",
"section": "Getting Started"
}
}
],
"total": 15,
"took_ms": 45
}
```
### Status Output
```
Devour Status
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Index Health: ✅ Healthy
Documents: 1,247 indexed
Chunks: 8,392 total
Vector Dimension: 1536
Last Updated: 2025-01-15 10:30:00
Storage Used: 124 MB
Sources (3):
✅ example-docs (234 docs, synced 2h ago)
✅ api-spec (12 docs, synced 1d ago)
⚠️ internal-repo (pending first sync)
Next Scheduled Sync: 2025-01-18 10:30:00
```
## Integration Patterns
### With OpenCode
```yaml
# In OpenCode session
> /devour init
> /devour scrape https://docs.myframework.com
> /devour serve
# In another terminal or session
> /devour query "how to handle authentication"
# Returns relevant context for AI
```
### With AI Assistant
```go
// AI assistant queries Devour automatically
func getRelevantContext(query string) string {
resp, _ := http.Post("http://localhost:8080/query",
"application/json",
bytes.NewReader([]byte(`{"query":"`+query+`"}`)))
var result QueryResponse
json.NewDecoder(resp.Body).Decode(&result)
// Inject into prompt
return formatContextForAI(result.Results)
}
```
### As MCP Tool
```json
// AI calls via MCP
{
"method": "tools/call",
"params": {
"name": "devour_query",
"arguments": {
"query": "API rate limiting",
"limit": 5
}
}
}
```
## Sub-Skills
This skill can delegate to specialized modules:
1. **devour-scrape** — Scraping operations
2. **devour-index** — Indexing and embeddings
3. **devour-query** — Search and retrieval
4. **devour-sync** — Synchronization tasks
5. **devour-serve** — Server management
## Error Handling
| Error | Cause | Resolution |
|-------|-------|------------|
| `E001` | OpenAI API error | Check API key, rate limits |
| `E002` | Source unreachable | Verify URL, check network |
| `E003` | Storage write failure | Check permissions, disk space |
| `E004` | Invalid source type | Use supported: url, github, openapi, local |
| `E005` | Index corruption | Rebuild index with `devour sync --rebuild` |
## Performance Tuning
### Scraping
```yaml
scraper:
concurrency: 20 # Parallel workers
rate_limit: 200ms # Between requests
timeout: 60s # Per request
```
### Indexing
```yaml
embeddings:
batch_size: 200 # API batch size
vector_db:
index_type: hnsw # Fast similarity search
m: 16 # HNSW connectivity
```
### Querying
```yaml
query:
ef_search: 64 # HNSW search depth
limit: 10 # Default result count
```
## Troubleshooting
### Common Issues
**Slow queries:**
- Increase `ef_search` for better recall
- Use smaller `limit` values
- Consider index type (HNSW vs Flat)
**API rate limits:**
- Reduce `batch_size`
- Add delays between batches
- Use caching
**Memory usage:**
- Reduce `concurrency`
- Process in smaller batches
- Use disk-backed storage
---
*Devour: Feed your AI the context it craves.*