This commit is contained in:
Tomas Dvorak
2026-02-24 10:33:59 +01:00
parent 409acd2e08
commit 898a3c303f
1374 changed files with 290409 additions and 29187 deletions
File diff suppressed because it is too large Load Diff
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,433 @@
{
"assessments": {
"abstraction_fitness": {
"score": 40.7,
"components": [
"Abstraction Leverage",
"Indirection Cost",
"Interface Honesty"
],
"component_scores": {
"Abstraction Leverage": 60.6,
"Indirection Cost": 71.4,
"Interface Honesty": 63.6
}
},
"cross_module_architecture": 68.3,
"design_coherence": 40.9,
"error_consistency": 45.2,
"test_strategy": 46.4
},
"dimension_notes": {
"cross_module_architecture": {
"evidence": [
"All assigned `internal/quality/*.go` files stay within a single package boundary (`package quality`) and only import standard library packages (`time`, `testing`, `strings`), with no cross-package dependency fan-out from this slice.",
"`pkg/rustdocs/parser_test.go` is isolated to `package rustdocs` and imports only stdlib plus `github.com/PuerkitoBio/goquery`; it does not couple into `internal/quality` types or helpers.",
"No `init()` functions, package-level mutable singleton wiring, or import-time execution patterns were found in the reviewed files; behavior is test-function scoped and constructor-invoked (`NewParser`, `NewScorer`, `NewNarrativeGenerator`).",
"Type declarations in `internal/quality/types.go` and `internal/quality/enhanced_types.go` are cohesive data-model definitions within one module boundary rather than cross-module shims or compatibility layers."
],
"impact_scope": "local",
"fix_scope": "single_edit",
"confidence": "high",
"unreported_risk": "This batch covers only five files; architectural hotspots could still exist in non-assigned packages (e.g., runtime wiring or broader dependency graph) outside this evidence window."
},
"abstraction_fitness": {
"evidence": [
"Language-to-doc behavior is spread across multiple large switches: URL construction in cmd/get.go:78-173, type mapping in cmd/get.go:175-205, and term derivation in cmd/ask.go:205-260+.",
"External scraper implementations repeat the same transport/change-detection scaffold (config+parser+http client fields, URL check, fetchPage, generateHash, DetectChanges) across multiple files, e.g. internal/scraper/external/godocs.go:17-121, internal/scraper/external/javadocs.go:16-115, internal/scraper/external/nuxtdocs.go:16-120, internal/scraper/external/cloudflaredocs.go:16-105.",
"Vector store abstraction exposes implementations that are selected by default config but intentionally unimplemented: internal/config/config.go:121-125 defaults to chromem, while internal/vector/store.go:221-243 returns \"chromem store not implemented\" for all operations.",
"Configuration defaults are duplicated in two representations (typed defaults and hand-written YAML template), increasing drift risk: cmd/init.go:92-149 and internal/config/config.go:104-160."
],
"impact_scope": "subsystem",
"fix_scope": "architectural_change",
"confidence": "high",
"unreported_risk": "",
"sub_axes": {
"abstraction_leverage": 62.0,
"indirection_cost": 71.0,
"interface_honesty": 60.0
}
},
"test_strategy": {
"evidence": [
"`internal/quality/narrative_test.go` validates exact headline/action prose and directly tests internal helper behavior (e.g., `determinePhase`, `generateHeadline`, `classifyDimension`), creating high implementation-coupling.",
"`internal/quality/scoring_test.go` similarly focuses on exact internal scoring details and string key literals, which makes refactors noisy and discourages safe design changes.",
"`pkg/rustdocs/parser_test.go` is heavily happy-path: it checks successful parses and minimal field presence but has no malformed-input/error-path cases for parser resilience.",
"`README.md` marks parts of the CLI as unstable/stubbed, but assigned tests do not provide cross-module contract/integration safety nets for those runtime boundaries."
],
"impact_scope": "subsystem",
"fix_scope": "multi_file_refactor",
"confidence": "high",
"unreported_risk": ""
},
"design_coherence": {
"evidence": [
"Parallel implementations of the same scorecard pipeline exist in `cmd/devour_scorecard.py` and `cmd/scorecard_generator.py` with near-identical function layouts (`ScorecardData`, `score_color`, `draw_left_panel`, `draw_right_panel`, `generate_scorecard`, `main`) and only minor line-level differences.",
"Three variants of enhanced generator (`cmd/devour_enhanced.py`, `cmd/devour_enhanced_fixed.py`, `cmd/devour_enhanced_v2.py`) repeat almost the full rendering stack (`draw_header_section`, `draw_enhanced_left_panel`, `draw_enhanced_right_panel`, `draw_trends_section`, `load_enhanced_devour_data`), creating branch-by-copy evolution.",
"Scraper adapters across providers (`internal/scraper/external/astrodocs.go`, `internal/scraper/external/cloudflaredocs.go`, `internal/scraper/external/reactdocs.go`) duplicate fetch/hash/change-detection and document assembly patterns with provider-specific data glued inline, indicating repeated structural pattern without shared orchestration abstraction.",
"Within `cmd/devour_lighthouse.py`, `load_font` is defined twice (once near top and again later), showing local design drift and utility ownership ambiguity."
],
"impact_scope": "codebase",
"fix_scope": "architectural_change",
"confidence": "high",
"unreported_risk": ""
},
"error_consistency": {
"evidence": [
"Raw error passthrough is common in core flows (e.g., `return nil, err` in `internal/search/engine.go:114`, `internal/search/engine.go:122`, `internal/scraper/openapi.go:45`, `internal/scraper/openapi.go:50`) while nearby code wraps with operation context (e.g., `internal/search/engine.go:111`, `internal/scraper/openapi.go:153`).",
"Failure handling style diverges between aborting, propagating, and suppressing in similar backend paths: `panic(...)` in `internal/quality/plugins/go/plugin.go:363`, warning print-and-continue in `internal/indexer/indexer.go:239`, and plain returns in `cmd/scrape.go:90`/`cmd/get.go:59`.",
"Some call paths lose caller context at command boundaries (`cmd/scrape.go:90`, `cmd/scrape.go:125`, `cmd/get.go:59`) despite contextual wrapping being used in other command-layer branches (`cmd/scrape.go:131`, `cmd/scrape.go:145`)."
],
"impact_scope": "subsystem",
"fix_scope": "multi_file_refactor",
"confidence": "high",
"unreported_risk": ""
}
},
"findings": [
{
"dimension": "abstraction_fitness",
"identifier": "language_catalog_scattered_switches",
"summary": "Language routing logic is duplicated across CLI flows instead of one catalog abstraction",
"related_files": [
"cmd/get.go",
"cmd/ask.go"
],
"evidence": [
"cmd/get.go:78-173 defines a large language switch for URL building; cmd/get.go:175-205 defines a second switch for source type mapping.",
"cmd/ask.go:205-260+ adds a third language switch for term heuristics, creating three independent sources of truth for one domain model."
],
"suggestion": "Introduce a single `LanguageSpec` registry (aliases, source type, URL builder, optional query-term strategy) in one package and have both `get` and `ask` consume it; keep per-language behavior as data/functions attached to that registry.",
"confidence": "high",
"impact_scope": "subsystem",
"fix_scope": "architectural_change"
},
{
"dimension": "abstraction_fitness",
"identifier": "external_scraper_scaffold_duplication",
"summary": "External scraper adapters reimplement the same transport/hash lifecycle repeatedly",
"related_files": [
"internal/scraper/external/godocs.go",
"internal/scraper/external/javadocs.go",
"internal/scraper/external/nuxtdocs.go",
"internal/scraper/external/cloudflaredocs.go"
],
"evidence": [
"Each file defines near-identical struct fields (`config`, `parser`, `client`), constructor wiring, URL-required guard, `fetchPage`, `generateHash`, and `DetectChanges` flow (e.g., godocs.go:17-121 and javadocs.go:16-115).",
"Duplication scales linearly with each new source adapter, increasing edit surface for cross-cutting behavior (timeouts, headers, error mapping)."
],
"suggestion": "Extract a shared `HTTPDocScraperBase` (or composable helper functions) for request execution, status handling, hashing, and change detection; keep each adapter focused on parser invocation and domain-specific document mapping.",
"confidence": "high",
"impact_scope": "subsystem",
"fix_scope": "multi_file_refactor"
},
{
"dimension": "abstraction_fitness",
"identifier": "default_selects_unimplemented_store",
"summary": "Store interface contract is dishonest because default backend is not operational",
"related_files": [
"internal/vector/store.go",
"internal/config/config.go"
],
"evidence": [
"internal/config/config.go:121-125 sets default vector DB type to `chromem`.",
"internal/vector/store.go:221-243 returns `chromem store not implemented` for all `Store` operations after `NewStore` can select that backend (store.go:63-72)."
],
"suggestion": "Either implement `ChromemStore` before exposing it as default, or switch default to a working backend and gate chromem behind explicit opt-in plus capability check at startup.",
"confidence": "high",
"impact_scope": "module",
"fix_scope": "architectural_change"
},
{
"dimension": "abstraction_fitness",
"identifier": "config_defaults_double_encoded",
"summary": "Initialization defaults are encoded twice with different abstractions",
"related_files": [
"cmd/init.go",
"internal/config/config.go"
],
"evidence": [
"cmd/init.go:92-149 hardcodes YAML defaults as a template string.",
"internal/config/config.go:104-160 hardcodes defaults again in typed structs, requiring synchronized updates across two representations."
],
"suggestion": "Generate init YAML from `config.Default()` via marshal + small post-processing/comments, or maintain a single canonical defaults schema consumed by both loader and init command.",
"confidence": "high",
"impact_scope": "module",
"fix_scope": "multi_file_refactor"
},
{
"dimension": "cross_module_architecture",
"identifier": "status_contract_string_map_boundary",
"summary": "Scorecard state uses string keys instead of shared Status type, weakening module contracts.",
"related_files": [
"internal/quality/types.go",
"internal/quality/scoring_test.go",
"README.md"
],
"evidence": [
"`internal/quality/types.go` defines `Status` constants but `Scorecard.StatusByType` is `map[string]int`.",
"`internal/quality/scoring_test.go` asserts `card.StatusByType[\"open\"]` and `card.StatusByType[\"fixed\"]` directly.",
"README promises resolution-state tracking, but this boundary is not type-safe."
],
"suggestion": "Change `Scorecard.StatusByType` to `map[Status]int` (or a dedicated typed struct), update serialization adapters if needed, and update tests to assert using `StatusOpen`/`StatusFixed` constants.",
"confidence": "high",
"impact_scope": "subsystem",
"fix_scope": "multi_file_refactor"
},
{
"dimension": "test_strategy",
"identifier": "brittle_private_and_copy_assertions_in_quality_tests",
"summary": "Quality tests are tightly coupled to private helpers and exact copy text, reducing refactor safety.",
"related_files": [
"internal/quality/narrative_test.go",
"internal/quality/scoring_test.go"
],
"evidence": [
"`narrative_test.go` directly asserts exact strings for generated headlines/actions and tests helper internals rather than stable external behavior.",
"`scoring_test.go` anchors on specific internal weighting outputs and literal status strings, which can fail on benign internal redesigns."
],
"suggestion": "Shift to contract-level tests against exported APIs with invariant assertions (phase category, presence of required fields, monotonic score behavior), and keep only a small set of snapshot/copy tests for user-facing text.",
"confidence": "high",
"impact_scope": "module",
"fix_scope": "multi_file_refactor"
},
{
"dimension": "test_strategy",
"identifier": "rust_parser_missing_negative_and_boundary_cases",
"summary": "Rust parser tests miss malformed-input and degradation-path coverage.",
"related_files": [
"pkg/rustdocs/parser_test.go",
"README.md"
],
"evidence": [
"`parser_test.go` cases are successful parses with valid fixture HTML and only basic assertions.",
"No tests verify behavior for malformed HTML, missing selectors, empty documents, or unsupported result rows.",
"README positions docs ingestion as core functionality, so parser failure behavior is a critical path."
],
"suggestion": "Add table-driven negative tests for malformed/partial HTML, empty search blocks, and missing headings; assert stable fallback behavior (explicit error or safe zero-value output) for each parser entrypoint.",
"confidence": "high",
"impact_scope": "module",
"fix_scope": "single_edit"
},
{
"dimension": "design_coherence",
"identifier": "scorecard_variant_sprawl",
"summary": "Scorecard generation is maintained as multiple copy-variants instead of one composable pipeline.",
"related_files": [
"cmd/devour_scorecard.py",
"cmd/scorecard_generator.py",
"cmd/devour_enhanced.py",
"cmd/devour_enhanced_fixed.py",
"cmd/devour_enhanced_v2.py"
],
"evidence": [
"Both `cmd/devour_scorecard.py` and `cmd/scorecard_generator.py` declare the same major functions and data model in the same order with only minor stylistic deltas.",
"Enhanced variants repeat the same section render functions and data loading flow, then diverge by ad-hoc edits, increasing change fan-out for any layout or scoring rule update."
],
"suggestion": "Extract a shared rendering core module (palette/fonts/layout primitives + data normalization), keep one canonical CLI entrypoint, and convert variant behavior into explicit theme/feature flags rather than duplicated files.",
"confidence": "high",
"impact_scope": "codebase",
"fix_scope": "architectural_change"
},
{
"dimension": "design_coherence",
"identifier": "external_scraper_template_duplication",
"summary": "Provider scrapers repeat the same orchestration flow with per-provider copy/paste adapters.",
"related_files": [
"internal/scraper/external/astrodocs.go",
"internal/scraper/external/cloudflaredocs.go",
"internal/scraper/external/reactdocs.go",
"internal/scraper/external/godocs.go",
"internal/scraper/external/vuedocs.go"
],
"evidence": [
"Each scraper reimplements nearly identical `Scrape`, `DetectChanges`, `fetchPage`, and `generateHash` scaffolding, then inlines provider-specific conversion methods.",
"The repeated constructor/client/parser wiring pattern appears across multiple files, indicating systemic pattern duplication rather than isolated differences."
],
"suggestion": "Introduce a shared `DocAdapter` contract and a generic `HTTPDocScraper` that owns fetch/hash/change-detect; keep provider files focused on mapping parsed domain objects to `Document`.",
"confidence": "high",
"impact_scope": "subsystem",
"fix_scope": "architectural_change"
},
{
"dimension": "design_coherence",
"identifier": "utility_ownership_drift_in_lighthouse_script",
"summary": "Duplicate utility definition in one file shows mixed responsibility boundaries.",
"related_files": [
"cmd/devour_lighthouse.py",
"cmd/devour_enhanced.py"
],
"evidence": [
"`cmd/devour_lighthouse.py` defines `load_font` twice with effectively the same fallback behavior, creating hidden override risk and unclear source of truth.",
"Comparable font utility exists in other renderer scripts, reinforcing that shared utility concerns are spread instead of centralized."
],
"suggestion": "Remove the duplicate in `cmd/devour_lighthouse.py` and move font-loading helpers into a shared module imported by all renderer scripts.",
"confidence": "high",
"impact_scope": "module",
"fix_scope": "multi_file_refactor"
},
{
"dimension": "error_consistency",
"identifier": "mixed_error_wrapping_in_scrape_and_search_paths",
"summary": "Related scrape/search paths mix raw passthrough and contextual wrapping.",
"related_files": [
"internal/search/engine.go",
"internal/scraper/openapi.go",
"internal/scraper/localsearch.go",
"cmd/scrape.go"
],
"evidence": [
"`internal/search/engine.go` frequently returns raw errors (`:114`, `:117`, `:122`, `:170`) but also uses contextual errors (`:111`, `:230`).",
"`internal/scraper/openapi.go` propagates raw errors from `readSpec`/`parseOpenAPISpec` (`:45`, `:50`, `:123`, `:141`, `:149`, `:157`, `:164`) while also defining wrapped errors (`:135`, `:153`, `:217`).",
"`internal/scraper/localsearch.go` returns raw errors from helper boundaries (`:79`, `:164`, `:191`, `:222`) mixed with rich wrapped messages in the same workflow (`:196`, `:203`, `:209`, `:217`)."
],
"suggestion": "Define a package-level rule: public methods must wrap downstream errors with operation context (using `%w`), and helper internals may return raw errors. Apply this consistently to `Rebuild/EnsureIndexed`, `OpenAPIScraper.Scrape/DetectChanges/readSpec`, and `LocalSearchScraper` methods.",
"confidence": "high",
"impact_scope": "subsystem",
"fix_scope": "multi_file_refactor"
},
{
"dimension": "error_consistency",
"identifier": "inconsistent_failure_channel_panic_vs_error_vs_warning",
"summary": "Failure signaling varies between panic, error return, and warning-only logging.",
"related_files": [
"internal/quality/plugins/go/plugin.go",
"internal/indexer/indexer.go",
"cmd/scrape.go"
],
"evidence": [
"`internal/quality/plugins/go/plugin.go:363` panics on plugin registration failure.",
"`internal/indexer/indexer.go:239` prints a warning and suppresses deletion errors instead of returning them.",
"`cmd/scrape.go` is structured around returned errors (`:131`, `:145`, `:207`) and has no panic-based handling, creating inconsistent contracts across subsystems."
],
"suggestion": "Standardize on explicit error returns for recoverable startup/runtime failures; replace plugin `panic` with registration error propagation or controlled process-exit at the command entrypoint, and make indexer deletion behavior explicit (either aggregate and return partial-failure errors or document/encode best-effort mode).",
"confidence": "high",
"impact_scope": "codebase",
"fix_scope": "architectural_change"
},
{
"dimension": "error_consistency",
"identifier": "command_boundary_context_loss",
"summary": "CLI command boundaries sometimes return raw errors without command context.",
"related_files": [
"cmd/get.go",
"cmd/scrape.go",
"internal/config/config.go"
],
"evidence": [
"`cmd/get.go:59` and `cmd/scrape.go:90`/`:125` return raw errors directly from downstream calls.",
"Other branches in the same command wrap with explicit context (`cmd/scrape.go:131`, `cmd/scrape.go:145`, `cmd/scrape.go:154`).",
"Config layer already emits contextual wrapped errors (`internal/config/config.go:177`, `internal/config/config.go:181`), so command-layer inconsistency creates uneven user-facing diagnostics."
],
"suggestion": "At CLI entrypoints, wrap all returned downstream errors with command/action context (e.g., `run get`, `load config`, `scrape source`) and preserve root cause with `%w`; keep user-readable validation errors as direct messages.",
"confidence": "high",
"impact_scope": "module",
"fix_scope": "multi_file_refactor"
},
{
"dimension": "cross_module_architecture",
"identifier": "init_side_effect_registration_coupling",
"summary": "Scraper registration depends on import-time side effects and global mutable registry state.",
"related_files": [
"cmd/root.go",
"internal/scraper/external/register.go",
"internal/scraper/registry_simple.go"
],
"evidence": [
"Blank import in root command triggers registration implicitly rather than explicit bootstrap wiring.",
"Registration happens in `init()` and mutates shared global registry."
],
"suggestion": "Replace import-time registration with explicit bootstrap registration (e.g., `RegisterExternalScrapers()` called from startup), and pass registry instances through constructors to remove hidden global coupling.",
"confidence": "high",
"impact_scope": "subsystem",
"fix_scope": "architectural_change"
},
{
"dimension": "error_consistency",
"identifier": "mixed_process_termination_and_error_propagation",
"summary": "Error handling mixes panic/log.Fatal/os.Exit with returned errors across adjacent layers.",
"related_files": [
"cmd/root.go",
"cmd/scorecard.go",
"internal/quality/plugins/go/plugin.go",
"cleanup_unused.go"
],
"evidence": [
"`Execute()` exits process directly; scorecard helper exits inside utility flow; plugin registration panics on failure.",
"Most other command paths return wrapped errors, creating inconsistent failure semantics."
],
"suggestion": "Standardize on returning errors from library/command internals and only perform process exit in one top-level entrypoint; replace panic/log.Fatal in shared code with typed/wrapped errors.",
"confidence": "high",
"impact_scope": "codebase",
"fix_scope": "multi_file_refactor"
},
{
"dimension": "abstraction_fitness",
"identifier": "external_scraper_boilerplate_without_shared_base",
"summary": "External scraper implementations duplicate fetch/hash/error/document plumbing instead of sharing a base abstraction.",
"related_files": [
"internal/scraper/external/godocs.go",
"internal/scraper/external/rustdocs.go",
"internal/scraper/external/reactdocs.go",
"internal/scraper/external/nuxtdocs.go",
"internal/scraper/external/cloudflaredocs.go",
"internal/scraper/external/types.go"
],
"evidence": [
"Each scraper repeats `fetchPage`, status code checks, hash generation, and near-identical scrape control flow.",
"Alias-only types file adds indirection without behavior."
],
"suggestion": "Introduce a shared external-scraper base/helper for HTTP fetch, retries, hashing, and common error mapping; keep only parser-specific extraction and document shaping in each language scraper.",
"confidence": "high",
"impact_scope": "subsystem",
"fix_scope": "architectural_change"
},
{
"dimension": "test_strategy",
"identifier": "untested_unimplemented_runtime_paths",
"summary": "Core runtime paths are both under-tested and partially stubbed, leaving high-risk behavior unvalidated.",
"related_files": [
"internal/server/server.go",
"pkg/client/client.go",
"internal/vector/store.go",
"internal/search/engine.go",
"internal/indexer/indexer.go"
],
"evidence": [
"Server/client/store contain TODO or not-implemented branches without direct tests.",
"No direct test files exist for several core modules that govern querying, indexing, and serving."
],
"suggestion": "Add table-driven tests for client/server/store/indexer contracts first (error behavior and non-nil results), then implement missing paths behind those tests; prioritize integration tests that exercise scrape->index->query flow.",
"confidence": "high",
"impact_scope": "codebase",
"fix_scope": "multi_file_refactor"
},
{
"dimension": "design_coherence",
"identifier": "command_files_mix_multiple_responsibilities",
"summary": "Large CLI command files blend orchestration, domain logic, persistence, and formatting concerns.",
"related_files": [
"cmd/quality.go",
"cmd/scrape.go",
"cmd/ask.go"
],
"evidence": [
"`cmd/quality.go` combines scan setup, scoring/status persistence, resolve/fix/review workflows.",
"`cmd/scrape.go` combines config parsing, source detection/profiling, scrape execution, indexing, and source-state updates.",
"`cmd/ask.go` includes query derivation, source URL heuristics, ranking, summarization, and output formatting in one command module."
],
"suggestion": "Split command files into focused packages (transport/CLI binding vs service layer vs persistence helpers) and keep Cobra handlers as thin adapters invoking composable use-case functions.",
"confidence": "high",
"impact_scope": "module",
"fix_scope": "multi_file_refactor"
}
],
"review_quality": {
"batch_count": 6,
"dimension_coverage": 0.367,
"evidence_density": 2.167,
"high_score_without_risk": 0,
"finding_pressure": 49.296,
"dimensions_with_findings": 5
}
}
@@ -0,0 +1,50 @@
You are a focused subagent reviewer for a single holistic investigation batch.
Repository root: /home/tdvorak/Desktop/PROG_projekty/GOLANG/Devour
Immutable packet: /home/tdvorak/Desktop/PROG_projekty/GOLANG/Devour/.desloppify/review_packets/holistic_packet_20260223_100953.json
Batch index: 1
Batch name: Architecture & Coupling
Batch dimensions: cross_module_architecture
Batch rationale: god modules, import-time side effects
Files assigned:
- internal/quality/enhanced_types.go
- internal/quality/narrative_test.go
- internal/quality/scoring_test.go
- internal/quality/types.go
- pkg/rustdocs/parser_test.go
Task requirements:
1. Read the immutable packet and follow `system_prompt` constraints exactly.
2. Evaluate ONLY listed files and ONLY listed dimensions for this batch.
3. Return 0-10 high-quality findings for this batch (empty array allowed).
4. Score/finding consistency is required: broader or more severe findings MUST lower dimension scores.
5. Every finding must include `related_files` with at least 2 files when possible.
6. Every finding must include `impact_scope` and `fix_scope`.
7. Every scored dimension MUST include dimension_notes with concrete evidence.
8. If a dimension score is >85, include `unreported_risk` in dimension_notes.
9. Use exactly one decimal place for every assessment and abstraction sub-axis score.
10. Do not edit repository files.
11. Return ONLY valid JSON, no markdown fences.
Scope enums:
- impact_scope: "local" | "module" | "subsystem" | "codebase"
- fix_scope: "single_edit" | "multi_file_refactor" | "architectural_change"
Output schema:
{
"batch": "Architecture & Coupling",
"batch_index": 1,
"assessments": {"<dimension>": <0-100 with one decimal place>},
"dimension_notes": {
"<dimension>": {
"evidence": ["specific code observations"],
"impact_scope": "local|module|subsystem|codebase",
"fix_scope": "single_edit|multi_file_refactor|architectural_change",
"confidence": "high|medium|low",
"unreported_risk": "required when score >85",
"sub_axes": {"abstraction_leverage": 0-100 with one decimal place, "indirection_cost": 0-100 with one decimal place, "interface_honesty": 0-100 with one decimal place} // required for abstraction_fitness when evidence supports it
}
},
"findings": []
}
@@ -0,0 +1,77 @@
You are a focused subagent reviewer for a single holistic investigation batch.
Repository root: /home/tdvorak/Desktop/PROG_projekty/GOLANG/Devour
Immutable packet: /home/tdvorak/Desktop/PROG_projekty/GOLANG/Devour/.desloppify/review_packets/holistic_packet_20260223_100953.json
Batch index: 2
Batch name: Abstractions & Dependencies
Batch dimensions: abstraction_fitness
Batch rationale: abstraction hotspots (wrappers/interfaces/param bags), dep cycles
Files assigned:
- cmd/scrape.go
- internal/quality/plugins/go/analyzers/detectors.go
- internal/quality/plugins/go/analyzers/advanced.go
- internal/scraper/web.go
- internal/quality/plugins/go/plugin.go
- internal/scheduler/scheduler.go
- cmd/init.go
- internal/scraper/localsearch_test.go
- internal/config/config.go
- internal/ai/openai.go
- cmd/get.go
- cmd/get_test.go
- internal/quality/analyzers/controlflow.go
- internal/vector/store.go
- cmd/ask.go
- examples/demo_scrapers.go
- internal/indexer/indexer.go
- internal/scraper/openapi.go
- pkg/pythondocs/parser.go
- internal/quality/analyzers/dataflow.go
- internal/quality/scanner_test.go
- internal/server/server.go
- internal/scraper/localsearch.go
- internal/scraper/external/nuxtdocs.go
- internal/quality/plugins/go/analyzers/test_coverage.go
- internal/search/engine.go
- internal/scraper/external/astrodocs.go
- internal/scraper/external/cloudflaredocs.go
- internal/scraper/external/dockerdocs.go
- internal/scraper/external/godocs.go
- internal/scraper/external/javadocs.go
- internal/scraper/external/mcpdocs.go
Task requirements:
1. Read the immutable packet and follow `system_prompt` constraints exactly.
2. Evaluate ONLY listed files and ONLY listed dimensions for this batch.
3. Return 0-10 high-quality findings for this batch (empty array allowed).
4. Score/finding consistency is required: broader or more severe findings MUST lower dimension scores.
5. Every finding must include `related_files` with at least 2 files when possible.
6. Every finding must include `impact_scope` and `fix_scope`.
7. Every scored dimension MUST include dimension_notes with concrete evidence.
8. If a dimension score is >85, include `unreported_risk` in dimension_notes.
9. Use exactly one decimal place for every assessment and abstraction sub-axis score.
10. Do not edit repository files.
11. Return ONLY valid JSON, no markdown fences.
Scope enums:
- impact_scope: "local" | "module" | "subsystem" | "codebase"
- fix_scope: "single_edit" | "multi_file_refactor" | "architectural_change"
Output schema:
{
"batch": "Abstractions & Dependencies",
"batch_index": 2,
"assessments": {"<dimension>": <0-100 with one decimal place>},
"dimension_notes": {
"<dimension>": {
"evidence": ["specific code observations"],
"impact_scope": "local|module|subsystem|codebase",
"fix_scope": "single_edit|multi_file_refactor|architectural_change",
"confidence": "high|medium|low",
"unreported_risk": "required when score >85",
"sub_axes": {"abstraction_leverage": 0-100 with one decimal place, "indirection_cost": 0-100 with one decimal place, "interface_honesty": 0-100 with one decimal place} // required for abstraction_fitness when evidence supports it
}
},
"findings": []
}
@@ -0,0 +1,51 @@
You are a focused subagent reviewer for a single holistic investigation batch.
Repository root: /home/tdvorak/Desktop/PROG_projekty/GOLANG/Devour
Immutable packet: /home/tdvorak/Desktop/PROG_projekty/GOLANG/Devour/.desloppify/review_packets/holistic_packet_20260223_100953.json
Batch index: 3
Batch name: Governance & Contracts
Batch dimensions: cross_module_architecture, test_strategy
Batch rationale: architecture contracts, compatibility policy, docs-vs-runtime scope, and quality-gate coverage
Files assigned:
- README.md
- internal/quality/enhanced_types.go
- internal/quality/narrative_test.go
- internal/quality/scoring_test.go
- internal/quality/types.go
- pkg/rustdocs/parser_test.go
Task requirements:
1. Read the immutable packet and follow `system_prompt` constraints exactly.
2. Evaluate ONLY listed files and ONLY listed dimensions for this batch.
3. Return 0-10 high-quality findings for this batch (empty array allowed).
4. Score/finding consistency is required: broader or more severe findings MUST lower dimension scores.
5. Every finding must include `related_files` with at least 2 files when possible.
6. Every finding must include `impact_scope` and `fix_scope`.
7. Every scored dimension MUST include dimension_notes with concrete evidence.
8. If a dimension score is >85, include `unreported_risk` in dimension_notes.
9. Use exactly one decimal place for every assessment and abstraction sub-axis score.
10. Do not edit repository files.
11. Return ONLY valid JSON, no markdown fences.
Scope enums:
- impact_scope: "local" | "module" | "subsystem" | "codebase"
- fix_scope: "single_edit" | "multi_file_refactor" | "architectural_change"
Output schema:
{
"batch": "Governance & Contracts",
"batch_index": 3,
"assessments": {"<dimension>": <0-100 with one decimal place>},
"dimension_notes": {
"<dimension>": {
"evidence": ["specific code observations"],
"impact_scope": "local|module|subsystem|codebase",
"fix_scope": "single_edit|multi_file_refactor|architectural_change",
"confidence": "high|medium|low",
"unreported_risk": "required when score >85",
"sub_axes": {"abstraction_leverage": 0-100 with one decimal place, "indirection_cost": 0-100 with one decimal place, "interface_honesty": 0-100 with one decimal place} // required for abstraction_fitness when evidence supports it
}
},
"findings": []
}
@@ -0,0 +1,196 @@
You are a focused subagent reviewer for a single holistic investigation batch.
Repository root: /home/tdvorak/Desktop/PROG_projekty/GOLANG/Devour
Immutable packet: /home/tdvorak/Desktop/PROG_projekty/GOLANG/Devour/.desloppify/review_packets/holistic_packet_20260223_100953.json
Batch index: 4
Batch name: Design Coherence — Mechanical Concern Signals
Batch dimensions: design_coherence
Batch rationale: mechanical detectors identified structural patterns needing judgment; concern types: duplication_design, mixed_responsibilities, systemic_pattern
Files assigned:
- .desloppify/query.json
- .github/workflows/ci.yml
- AGENTS.md
- cmd/devour_enhanced.py
- cmd/devour_enhanced_fixed.py
- cmd/devour_enhanced_v2.py
- cmd/devour_lighthouse.py
- cmd/devour_scorecard.py
- cmd/quality.go
- cmd/scorecard_generator.py
- desloppify/desloppify/desloppify/app/commands/_show_terminal.py
- desloppify/desloppify/desloppify/app/commands/fix/apply_flow.py
- desloppify/desloppify/desloppify/app/commands/issues_cmd.py
- desloppify/desloppify/desloppify/app/commands/next.py
- desloppify/desloppify/desloppify/app/commands/resolve/selection.py
- desloppify/desloppify/desloppify/app/commands/scan/scan_reporting_llm.py
- desloppify/desloppify/desloppify/app/commands/status_parts/render.py
- desloppify/desloppify/desloppify/app/output/scorecard_parts/projection.py
- desloppify/desloppify/desloppify/engine/detectors/security/rules.py
- desloppify/desloppify/desloppify/engine/scoring_internal/subjective/core.py
- desloppify/desloppify/desloppify/engine/state_internal/resolution.py
- desloppify/desloppify/desloppify/intelligence/review/__init__.py
- desloppify/desloppify/desloppify/intelligence/review/context_internal/structure.py
- desloppify/desloppify/desloppify/intelligence/review/dimensions/data.py
- desloppify/desloppify/desloppify/intelligence/review/importing/holistic.py
- desloppify/desloppify/desloppify/languages/_shared/phases_common.py
- desloppify/desloppify/desloppify/languages/_shared/review_data/dimensions.json
- desloppify/desloppify/desloppify/languages/_shared/scaffold_detect_commands.py
- desloppify/desloppify/desloppify/languages/csharp/_parse_helpers.py
- desloppify/desloppify/desloppify/languages/csharp/commands.py
- desloppify/desloppify/desloppify/languages/csharp/deps/cli.py
- desloppify/desloppify/desloppify/languages/csharp/deps/fallback.py
- desloppify/desloppify/desloppify/languages/csharp/detectors/deps.py
- desloppify/desloppify/desloppify/languages/csharp/phases.py
- desloppify/desloppify/desloppify/languages/csharp/test_coverage.py
- desloppify/desloppify/desloppify/languages/dart/__init__.py
- desloppify/desloppify/desloppify/languages/dart/commands.py
- desloppify/desloppify/desloppify/languages/dart/detectors/deps.py
- desloppify/desloppify/desloppify/languages/dart/extractors.py
- desloppify/desloppify/desloppify/languages/dart/move.py
- desloppify/desloppify/desloppify/languages/framework/commands_base.py
- desloppify/desloppify/desloppify/languages/gdscript/__init__.py
- desloppify/desloppify/desloppify/languages/gdscript/detectors/deps.py
- desloppify/desloppify/desloppify/languages/python/__init__.py
- desloppify/desloppify/desloppify/languages/python/commands.py
- desloppify/desloppify/desloppify/languages/python/detectors/security.py
- desloppify/desloppify/desloppify/languages/python/detectors/smells.py
- desloppify/desloppify/desloppify/languages/python/move.py
- desloppify/desloppify/desloppify/languages/python/phases.py
- desloppify/desloppify/desloppify/languages/python/test_coverage.py
- desloppify/desloppify/desloppify/languages/python/tests/test_py_facade.py
- desloppify/desloppify/desloppify/languages/typescript/detectors/_smell_detectors.py
- desloppify/desloppify/desloppify/languages/typescript/detectors/_smell_effects.py
- desloppify/desloppify/desloppify/languages/typescript/detectors/deps.py
- desloppify/desloppify/desloppify/languages/typescript/detectors/exports.py
- desloppify/desloppify/desloppify/languages/typescript/detectors/react.py
- desloppify/desloppify/desloppify/languages/typescript/detectors/unused.py
- desloppify/desloppify/desloppify/languages/typescript/fixers/common.py
- desloppify/desloppify/desloppify/languages/typescript/fixers/if_chain.py
- desloppify/desloppify/desloppify/languages/typescript/fixers/logs.py
- desloppify/desloppify/desloppify/languages/typescript/tests/test_ts_concerns.py
- desloppify/desloppify/desloppify/languages/typescript/tests/test_ts_deprecated.py
- desloppify/desloppify/desloppify/languages/typescript/tests/test_ts_deps.py
- desloppify/desloppify/desloppify/languages/typescript/tests/test_ts_exports.py
- desloppify/desloppify/desloppify/languages/typescript/tests/test_ts_fixers.py
- desloppify/desloppify/desloppify/languages/typescript/tests/test_ts_logs.py
- desloppify/desloppify/desloppify/languages/typescript/tests/test_ts_react.py
- desloppify/desloppify/desloppify/tests/commands/fix/test_cmd_fix_review.py
- desloppify/desloppify/desloppify/tests/commands/test_cmd_detect.py
- desloppify/desloppify/desloppify/tests/commands/test_cmd_fix.py
- desloppify/desloppify/desloppify/tests/commands/test_cmd_next.py
- desloppify/desloppify/desloppify/tests/commands/test_cmd_scan.py
- desloppify/desloppify/desloppify/tests/commands/test_cmd_show.py
- desloppify/desloppify/desloppify/tests/commands/test_config_cmd.py
- desloppify/desloppify/desloppify/tests/detectors/test_architecture_boundaries.py
- desloppify/desloppify/desloppify/tests/detectors/test_complexity.py
- desloppify/desloppify/desloppify/tests/detectors/test_coupling.py
- desloppify/desloppify/desloppify/tests/detectors/test_gods.py
- desloppify/desloppify/desloppify/tests/detectors/test_naming.py
- desloppify/desloppify/desloppify/tests/detectors/test_orphaned.py
- desloppify/desloppify/desloppify/tests/lang/common/test_lang_contract_validation.py
- desloppify/desloppify/desloppify/tests/lang/csharp/test_csharp_deps.py
- desloppify/desloppify/desloppify/tests/lang/csharp/test_csharp_scan.py
- desloppify/desloppify/desloppify/tests/lang/dart/test_dart_deps.py
- desloppify/desloppify/desloppify/tests/review/test_review_coverage.py
- desloppify/desloppify/desloppify/tests/review/test_review_dimensions_direct.py
- desloppify/desloppify/desloppify/tests/review/test_work_queue.py
- desloppify/desloppify/desloppify/tests/scan/test_flat_dirs.py
- desloppify/desloppify/desloppify/tests/scan/test_scan_reporting_direct.py
- desloppify/desloppify/desloppify/tests/scan/test_scan_workflow_wontfix_direct.py
- desloppify/desloppify/desloppify/tests/scoring/test_scorecard.py
- desloppify/desloppify/desloppify/tests/scoring/test_scorecard_draw_direct.py
- desloppify/desloppify/desloppify/tests/snapshots/cli_smoke/state-python.json
- desloppify/desloppify/desloppify/tests/state/test_state.py
- desloppify/desloppify/desloppify/tests/state/test_state_internal_direct.py
- devour_data/docs/docker_compose_-_ask_me_about_docker_1.md
- devour_data/docs/docker_compose_-_browse_common_faqs_10.md
- devour_data/docs/docker_compose_-_docker_compose_2.md
- devour_data/docs/docker_compose_-_explore_the_compose_file_referenc_8.md
- devour_data/docs/docker_compose_-_how_compose_works_4.md
- devour_data/docs/docker_compose_-_install_compose_5.md
- devour_data/docs/docker_compose_-_use_compose_bridge_9.md
- internal/ai/openai.go
- internal/quality/analyzers/dataflow.go
- internal/quality/detector_test.go
- internal/quality/detectors/complexity.go
- internal/quality/languages.go
- internal/quality/languages_test.go
- internal/quality/narrative_test.go
- internal/quality/plugins/go/analyzers/deadcode.go
- internal/quality/plugins/go/analyzers/detectors.go
- internal/quality/plugins/go/analyzers/security.go
- internal/quality/plugins/go/analyzers/test_coverage.go
- internal/quality/plugins/go/fixers/advanced_fixers.go
- internal/quality/plugins/go/fixers/fixers.go
- internal/quality/scoring_test.go
- internal/quality/state_test.go
- internal/scraper/external/astrodocs.go
- internal/scraper/external/cloudflaredocs.go
- internal/scraper/external/godocs.go
- internal/scraper/external/javadocs.go
- internal/scraper/external/nuxtdocs.go
- internal/scraper/external/pythondocs.go
- internal/scraper/external/reactdocs.go
- internal/scraper/external/rustdocs.go
- internal/scraper/external/springdocs.go
- internal/scraper/external/vuedocs.go
- internal/scraper/localsearch_test.go
- internal/scraper/web.go
- internal/scraper/web_integration_test.go
- landing/dist/index.html
- landing/src/components/sections/Footer.tsx
- landing/src/index.css
- pkg/astrodocs/parser.go
- pkg/astrodocs/parser_test.go
- pkg/cloudflaredocs/parser.go
- pkg/cloudflaredocs/parser_test.go
- pkg/dockerdocs/parser.go
- pkg/godocs/parser.go
- pkg/godocs/parser_test.go
- pkg/javadocs/parser.go
- pkg/javadocs/parser_test.go
- pkg/nuxtdocs/parser.go
- pkg/nuxtdocs/parser_test.go
- pkg/nuxtdocs/types.go
- pkg/pythondocs/parser.go
- pkg/pythondocs/parser_test.go
- pkg/reactdocs/parser.go
- pkg/rustdocs/parser.go
- pkg/springdocs/parser.go
- pkg/vuedocs/parser.go
Task requirements:
1. Read the immutable packet and follow `system_prompt` constraints exactly.
2. Evaluate ONLY listed files and ONLY listed dimensions for this batch.
3. Return 0-10 high-quality findings for this batch (empty array allowed).
4. Score/finding consistency is required: broader or more severe findings MUST lower dimension scores.
5. Every finding must include `related_files` with at least 2 files when possible.
6. Every finding must include `impact_scope` and `fix_scope`.
7. Every scored dimension MUST include dimension_notes with concrete evidence.
8. If a dimension score is >85, include `unreported_risk` in dimension_notes.
9. Use exactly one decimal place for every assessment and abstraction sub-axis score.
10. Do not edit repository files.
11. Return ONLY valid JSON, no markdown fences.
Scope enums:
- impact_scope: "local" | "module" | "subsystem" | "codebase"
- fix_scope: "single_edit" | "multi_file_refactor" | "architectural_change"
Output schema:
{
"batch": "Design Coherence — Mechanical Concern Signals",
"batch_index": 4,
"assessments": {"<dimension>": <0-100 with one decimal place>},
"dimension_notes": {
"<dimension>": {
"evidence": ["specific code observations"],
"impact_scope": "local|module|subsystem|codebase",
"fix_scope": "single_edit|multi_file_refactor|architectural_change",
"confidence": "high|medium|low",
"unreported_risk": "required when score >85",
"sub_axes": {"abstraction_leverage": 0-100 with one decimal place, "indirection_cost": 0-100 with one decimal place, "interface_honesty": 0-100 with one decimal place} // required for abstraction_fitness when evidence supports it
}
},
"findings": []
}
@@ -0,0 +1,125 @@
You are a focused subagent reviewer for a single holistic investigation batch.
Repository root: /home/tdvorak/Desktop/PROG_projekty/GOLANG/Devour
Immutable packet: /home/tdvorak/Desktop/PROG_projekty/GOLANG/Devour/.desloppify/review_packets/holistic_packet_20260223_100953.json
Batch index: 5
Batch name: Cross-cutting Sweep
Batch dimensions: error_consistency
Batch rationale: selected dimensions had no direct batch mapping; review representative cross-cutting files
Files assigned:
- internal/quality/enhanced_types.go
- internal/quality/narrative_test.go
- internal/quality/scoring_test.go
- internal/quality/types.go
- pkg/rustdocs/parser_test.go
- cmd/scrape.go
- internal/quality/plugins/go/analyzers/detectors.go
- internal/quality/plugins/go/analyzers/advanced.go
- internal/scraper/web.go
- internal/quality/plugins/go/plugin.go
- internal/scheduler/scheduler.go
- cmd/init.go
- internal/scraper/localsearch_test.go
- internal/config/config.go
- internal/ai/openai.go
- cmd/get.go
- cmd/get_test.go
- internal/quality/analyzers/controlflow.go
- internal/vector/store.go
- cmd/ask.go
- examples/demo_scrapers.go
- internal/indexer/indexer.go
- internal/scraper/openapi.go
- pkg/pythondocs/parser.go
- internal/quality/analyzers/dataflow.go
- internal/quality/scanner_test.go
- internal/server/server.go
- internal/scraper/localsearch.go
- internal/scraper/external/nuxtdocs.go
- internal/quality/plugins/go/analyzers/test_coverage.go
- internal/search/engine.go
- internal/scraper/external/astrodocs.go
- internal/scraper/external/cloudflaredocs.go
- internal/scraper/external/dockerdocs.go
- internal/scraper/external/godocs.go
- internal/scraper/external/javadocs.go
- internal/scraper/external/mcpdocs.go
- README.md
- .desloppify/query.json
- .github/workflows/ci.yml
- AGENTS.md
- cmd/devour_enhanced.py
- cmd/devour_enhanced_fixed.py
- cmd/devour_enhanced_v2.py
- cmd/devour_lighthouse.py
- cmd/devour_scorecard.py
- cmd/quality.go
- cmd/scorecard_generator.py
- desloppify/desloppify/desloppify/app/commands/_show_terminal.py
- desloppify/desloppify/desloppify/app/commands/fix/apply_flow.py
- desloppify/desloppify/desloppify/app/commands/issues_cmd.py
- desloppify/desloppify/desloppify/app/commands/next.py
- desloppify/desloppify/desloppify/app/commands/resolve/selection.py
- desloppify/desloppify/desloppify/app/commands/scan/scan_reporting_llm.py
- desloppify/desloppify/desloppify/app/commands/status_parts/render.py
- desloppify/desloppify/desloppify/app/output/scorecard_parts/projection.py
- desloppify/desloppify/desloppify/engine/detectors/security/rules.py
- desloppify/desloppify/desloppify/engine/scoring_internal/subjective/core.py
- desloppify/desloppify/desloppify/engine/state_internal/resolution.py
- desloppify/desloppify/desloppify/intelligence/review/__init__.py
- desloppify/desloppify/desloppify/intelligence/review/context_internal/structure.py
- desloppify/desloppify/desloppify/intelligence/review/dimensions/data.py
- desloppify/desloppify/desloppify/intelligence/review/importing/holistic.py
- desloppify/desloppify/desloppify/languages/_shared/phases_common.py
- desloppify/desloppify/desloppify/languages/_shared/review_data/dimensions.json
- desloppify/desloppify/desloppify/languages/_shared/scaffold_detect_commands.py
- desloppify/desloppify/desloppify/languages/csharp/_parse_helpers.py
- desloppify/desloppify/desloppify/languages/csharp/commands.py
- desloppify/desloppify/desloppify/languages/csharp/deps/cli.py
- desloppify/desloppify/desloppify/languages/csharp/deps/fallback.py
- desloppify/desloppify/desloppify/languages/csharp/detectors/deps.py
- desloppify/desloppify/desloppify/languages/csharp/phases.py
- desloppify/desloppify/desloppify/languages/csharp/test_coverage.py
- desloppify/desloppify/desloppify/languages/dart/__init__.py
- desloppify/desloppify/desloppify/languages/dart/commands.py
- desloppify/desloppify/desloppify/languages/dart/detectors/deps.py
- desloppify/desloppify/desloppify/languages/dart/extractors.py
- desloppify/desloppify/desloppify/languages/dart/move.py
- desloppify/desloppify/desloppify/languages/framework/commands_base.py
- desloppify/desloppify/desloppify/languages/gdscript/__init__.py
Task requirements:
1. Read the immutable packet and follow `system_prompt` constraints exactly.
2. Evaluate ONLY listed files and ONLY listed dimensions for this batch.
3. Return 0-10 high-quality findings for this batch (empty array allowed).
4. Score/finding consistency is required: broader or more severe findings MUST lower dimension scores.
5. Every finding must include `related_files` with at least 2 files when possible.
6. Every finding must include `impact_scope` and `fix_scope`.
7. Every scored dimension MUST include dimension_notes with concrete evidence.
8. If a dimension score is >85, include `unreported_risk` in dimension_notes.
9. Use exactly one decimal place for every assessment and abstraction sub-axis score.
10. Do not edit repository files.
11. Return ONLY valid JSON, no markdown fences.
Scope enums:
- impact_scope: "local" | "module" | "subsystem" | "codebase"
- fix_scope: "single_edit" | "multi_file_refactor" | "architectural_change"
Output schema:
{
"batch": "Cross-cutting Sweep",
"batch_index": 5,
"assessments": {"<dimension>": <0-100 with one decimal place>},
"dimension_notes": {
"<dimension>": {
"evidence": ["specific code observations"],
"impact_scope": "local|module|subsystem|codebase",
"fix_scope": "single_edit|multi_file_refactor|architectural_change",
"confidence": "high|medium|low",
"unreported_risk": "required when score >85",
"sub_axes": {"abstraction_leverage": 0-100 with one decimal place, "indirection_cost": 0-100 with one decimal place, "interface_honesty": 0-100 with one decimal place} // required for abstraction_fitness when evidence supports it
}
},
"findings": []
}
@@ -0,0 +1,158 @@
You are a focused subagent reviewer for a single holistic investigation batch.
Repository root: /home/tdvorak/Desktop/PROG_projekty/GOLANG/Devour
Immutable packet: /home/tdvorak/Desktop/PROG_projekty/GOLANG/Devour/.desloppify/review_packets/holistic_packet_20260223_100953.json
Batch index: 6
Batch name: Full Codebase Sweep
Batch dimensions: cross_module_architecture, error_consistency, abstraction_fitness, test_strategy, design_coherence
Batch rationale: thorough default: evaluate cross-cutting quality across all production files
Files assigned:
- cleanup_unused.go
- cmd/ask.go
- cmd/demo.go
- cmd/devour/main.go
- cmd/generate_scorecards/main.go
- cmd/get.go
- cmd/init.go
- cmd/languages.go
- cmd/push.go
- cmd/quality.go
- cmd/query.go
- cmd/realtest/main.go
- cmd/root.go
- cmd/runtime_helpers.go
- cmd/scorecard.go
- cmd/scrape.go
- cmd/serve.go
- cmd/status.go
- cmd/sync.go
- examples/demo_scrapers.go
- internal/ai/ai.go
- internal/ai/openai.go
- internal/config/config.go
- internal/indexer/indexer.go
- internal/markdown/formatter.go
- internal/projectstate/state.go
- internal/quality/analyzers/controlflow.go
- internal/quality/analyzers/dataflow.go
- internal/quality/analyzers/practices.go
- internal/quality/detector.go
- internal/quality/detectors/complexity.go
- internal/quality/detectors/duplication.go
- internal/quality/detectors/naming.go
- internal/quality/enhanced_types.go
- internal/quality/languages.go
- internal/quality/narrative.go
- internal/quality/plugins/go/analyzers/advanced.go
- internal/quality/plugins/go/analyzers/deadcode.go
- internal/quality/plugins/go/analyzers/detectors.go
- internal/quality/plugins/go/analyzers/security.go
- internal/quality/plugins/go/analyzers/test_coverage.go
- internal/quality/plugins/go/fixers/advanced_fixers.go
- internal/quality/plugins/go/fixers/fixers.go
- internal/quality/plugins/go/plugin.go
- internal/quality/plugins/plugin.go
- internal/quality/plugins/registry.go
- internal/quality/review/packet.go
- internal/quality/scanner.go
- internal/quality/scoring.go
- internal/quality/state.go
- internal/quality/types.go
- internal/scheduler/scheduler.go
- internal/scraper/external/astrodocs.go
- internal/scraper/external/cloudflaredocs.go
- internal/scraper/external/dockerdocs.go
- internal/scraper/external/godocs.go
- internal/scraper/external/javadocs.go
- internal/scraper/external/mcpdocs.go
- internal/scraper/external/nuxtdocs.go
- internal/scraper/external/pythondocs.go
- internal/scraper/external/reactdocs.go
- internal/scraper/external/register.go
- internal/scraper/external/rustdocs.go
- internal/scraper/external/springdocs.go
- internal/scraper/external/tsdocs.go
- internal/scraper/external/types.go
- internal/scraper/external/vuedocs.go
- internal/scraper/github.go
- internal/scraper/local.go
- internal/scraper/localsearch.go
- internal/scraper/normalize.go
- internal/scraper/openapi.go
- internal/scraper/register_core.go
- internal/scraper/registry_simple.go
- internal/scraper/scraper.go
- internal/scraper/web.go
- internal/scraper/wrapper.go
- internal/search/engine.go
- internal/server/server.go
- internal/storage/writer.go
- internal/ui/banner.go
- internal/ui/character.go
- internal/vector/store.go
- main.go
- pkg/astrodocs/parser.go
- pkg/astrodocs/types.go
- pkg/client/client.go
- pkg/cloudflaredocs/parser.go
- pkg/cloudflaredocs/types.go
- pkg/dockerdocs/parser.go
- pkg/dockerdocs/types.go
- pkg/godocs/parser.go
- pkg/godocs/types.go
- pkg/javadocs/parser.go
- pkg/javadocs/types.go
- pkg/mcpdocs/parser.go
- pkg/mcpdocs/types.go
- pkg/nuxtdocs/parser.go
- pkg/nuxtdocs/types.go
- pkg/parserutil/url.go
- pkg/pythondocs/parser.go
- pkg/pythondocs/types.go
- pkg/reactdocs/parser.go
- pkg/reactdocs/types.go
- pkg/rustdocs/parser.go
- pkg/rustdocs/types.go
- pkg/springdocs/parser.go
- pkg/springdocs/types.go
- pkg/tsdocs/parser.go
- pkg/tsdocs/types.go
- pkg/types/types.go
- pkg/vuedocs/parser.go
- pkg/vuedocs/types.go
Task requirements:
1. Read the immutable packet and follow `system_prompt` constraints exactly.
2. Evaluate ONLY listed files and ONLY listed dimensions for this batch.
3. Return 0-10 high-quality findings for this batch (empty array allowed).
4. Score/finding consistency is required: broader or more severe findings MUST lower dimension scores.
5. Every finding must include `related_files` with at least 2 files when possible.
6. Every finding must include `impact_scope` and `fix_scope`.
7. Every scored dimension MUST include dimension_notes with concrete evidence.
8. If a dimension score is >85, include `unreported_risk` in dimension_notes.
9. Use exactly one decimal place for every assessment and abstraction sub-axis score.
10. Do not edit repository files.
11. Return ONLY valid JSON, no markdown fences.
Scope enums:
- impact_scope: "local" | "module" | "subsystem" | "codebase"
- fix_scope: "single_edit" | "multi_file_refactor" | "architectural_change"
Output schema:
{
"batch": "Full Codebase Sweep",
"batch_index": 6,
"assessments": {"<dimension>": <0-100 with one decimal place>},
"dimension_notes": {
"<dimension>": {
"evidence": ["specific code observations"],
"impact_scope": "local|module|subsystem|codebase",
"fix_scope": "single_edit|multi_file_refactor|architectural_change",
"confidence": "high|medium|low",
"unreported_risk": "required when score >85",
"sub_axes": {"abstraction_leverage": 0-100 with one decimal place, "indirection_cost": 0-100 with one decimal place, "interface_honesty": 0-100 with one decimal place} // required for abstraction_fitness when evidence supports it
}
},
"findings": []
}
@@ -0,0 +1 @@
{"batch":"Architecture & Coupling","batch_index":1,"assessments":{"cross_module_architecture":100.0},"dimension_notes":{"cross_module_architecture":{"evidence":["All assigned `internal/quality/*.go` files stay within a single package boundary (`package quality`) and only import standard library packages (`time`, `testing`, `strings`), with no cross-package dependency fan-out from this slice.","`pkg/rustdocs/parser_test.go` is isolated to `package rustdocs` and imports only stdlib plus `github.com/PuerkitoBio/goquery`; it does not couple into `internal/quality` types or helpers.","No `init()` functions, package-level mutable singleton wiring, or import-time execution patterns were found in the reviewed files; behavior is test-function scoped and constructor-invoked (`NewParser`, `NewScorer`, `NewNarrativeGenerator`).","Type declarations in `internal/quality/types.go` and `internal/quality/enhanced_types.go` are cohesive data-model definitions within one module boundary rather than cross-module shims or compatibility layers."],"impact_scope":"local","fix_scope":"single_edit","confidence":"high","unreported_risk":"This batch covers only five files; architectural hotspots could still exist in non-assigned packages (e.g., runtime wiring or broader dependency graph) outside this evidence window."}},"findings":[]}
@@ -0,0 +1,97 @@
{
"batch": "Abstractions & Dependencies",
"batch_index": 2,
"assessments": {
"abstraction_fitness": 68.0
},
"dimension_notes": {
"abstraction_fitness": {
"evidence": [
"Language-to-doc behavior is spread across multiple large switches: URL construction in cmd/get.go:78-173, type mapping in cmd/get.go:175-205, and term derivation in cmd/ask.go:205-260+.",
"External scraper implementations repeat the same transport/change-detection scaffold (config+parser+http client fields, URL check, fetchPage, generateHash, DetectChanges) across multiple files, e.g. internal/scraper/external/godocs.go:17-121, internal/scraper/external/javadocs.go:16-115, internal/scraper/external/nuxtdocs.go:16-120, internal/scraper/external/cloudflaredocs.go:16-105.",
"Vector store abstraction exposes implementations that are selected by default config but intentionally unimplemented: internal/config/config.go:121-125 defaults to chromem, while internal/vector/store.go:221-243 returns \"chromem store not implemented\" for all operations.",
"Configuration defaults are duplicated in two representations (typed defaults and hand-written YAML template), increasing drift risk: cmd/init.go:92-149 and internal/config/config.go:104-160."
],
"impact_scope": "subsystem",
"fix_scope": "architectural_change",
"confidence": "high",
"sub_axes": {
"abstraction_leverage": 62.0,
"indirection_cost": 71.0,
"interface_honesty": 60.0
}
}
},
"findings": [
{
"dimension": "abstraction_fitness",
"identifier": "language_catalog_scattered_switches",
"summary": "Language routing logic is duplicated across CLI flows instead of one catalog abstraction",
"related_files": [
"cmd/get.go",
"cmd/ask.go"
],
"evidence": [
"cmd/get.go:78-173 defines a large language switch for URL building; cmd/get.go:175-205 defines a second switch for source type mapping.",
"cmd/ask.go:205-260+ adds a third language switch for term heuristics, creating three independent sources of truth for one domain model."
],
"suggestion": "Introduce a single `LanguageSpec` registry (aliases, source type, URL builder, optional query-term strategy) in one package and have both `get` and `ask` consume it; keep per-language behavior as data/functions attached to that registry.",
"confidence": "high",
"impact_scope": "subsystem",
"fix_scope": "architectural_change"
},
{
"dimension": "abstraction_fitness",
"identifier": "external_scraper_scaffold_duplication",
"summary": "External scraper adapters reimplement the same transport/hash lifecycle repeatedly",
"related_files": [
"internal/scraper/external/godocs.go",
"internal/scraper/external/javadocs.go",
"internal/scraper/external/nuxtdocs.go",
"internal/scraper/external/cloudflaredocs.go"
],
"evidence": [
"Each file defines near-identical struct fields (`config`, `parser`, `client`), constructor wiring, URL-required guard, `fetchPage`, `generateHash`, and `DetectChanges` flow (e.g., godocs.go:17-121 and javadocs.go:16-115).",
"Duplication scales linearly with each new source adapter, increasing edit surface for cross-cutting behavior (timeouts, headers, error mapping)."
],
"suggestion": "Extract a shared `HTTPDocScraperBase` (or composable helper functions) for request execution, status handling, hashing, and change detection; keep each adapter focused on parser invocation and domain-specific document mapping.",
"confidence": "high",
"impact_scope": "subsystem",
"fix_scope": "multi_file_refactor"
},
{
"dimension": "abstraction_fitness",
"identifier": "default_selects_unimplemented_store",
"summary": "Store interface contract is dishonest because default backend is not operational",
"related_files": [
"internal/vector/store.go",
"internal/config/config.go"
],
"evidence": [
"internal/config/config.go:121-125 sets default vector DB type to `chromem`.",
"internal/vector/store.go:221-243 returns `chromem store not implemented` for all `Store` operations after `NewStore` can select that backend (store.go:63-72)."
],
"suggestion": "Either implement `ChromemStore` before exposing it as default, or switch default to a working backend and gate chromem behind explicit opt-in plus capability check at startup.",
"confidence": "high",
"impact_scope": "module",
"fix_scope": "architectural_change"
},
{
"dimension": "abstraction_fitness",
"identifier": "config_defaults_double_encoded",
"summary": "Initialization defaults are encoded twice with different abstractions",
"related_files": [
"cmd/init.go",
"internal/config/config.go"
],
"evidence": [
"cmd/init.go:92-149 hardcodes YAML defaults as a template string.",
"internal/config/config.go:104-160 hardcodes defaults again in typed structs, requiring synchronized updates across two representations."
],
"suggestion": "Generate init YAML from `config.Default()` via marshal + small post-processing/comments, or maintain a single canonical defaults schema consumed by both loader and init command.",
"confidence": "high",
"impact_scope": "module",
"fix_scope": "multi_file_refactor"
}
]
}
@@ -0,0 +1,88 @@
{
"batch": "Governance & Contracts",
"batch_index": 3,
"assessments": {
"cross_module_architecture": 82.0,
"test_strategy": 74.0
},
"dimension_notes": {
"cross_module_architecture": {
"evidence": [
"`internal/quality/types.go` defines a typed status contract (`type Status string` with constants like `StatusOpen`, `StatusFixed`, `StatusWontfix`).",
"`internal/quality/types.go` also defines `Scorecard.StatusByType map[string]int`, which bypasses the typed status contract at the module boundary.",
"`internal/quality/scoring_test.go` asserts raw string keys (`\"open\"`, `\"fixed\"`) instead of using `Status` constants, reinforcing stringly-typed cross-component coupling.",
"`README.md` claims quality features include tracking resolution states, but the in-code state transport for scorecards is weakly typed."
],
"impact_scope": "subsystem",
"fix_scope": "multi_file_refactor",
"confidence": "high"
},
"test_strategy": {
"evidence": [
"`internal/quality/narrative_test.go` validates exact headline/action prose and directly tests internal helper behavior (e.g., `determinePhase`, `generateHeadline`, `classifyDimension`), creating high implementation-coupling.",
"`internal/quality/scoring_test.go` similarly focuses on exact internal scoring details and string key literals, which makes refactors noisy and discourages safe design changes.",
"`pkg/rustdocs/parser_test.go` is heavily happy-path: it checks successful parses and minimal field presence but has no malformed-input/error-path cases for parser resilience.",
"`README.md` marks parts of the CLI as unstable/stubbed, but assigned tests do not provide cross-module contract/integration safety nets for those runtime boundaries."
],
"impact_scope": "subsystem",
"fix_scope": "multi_file_refactor",
"confidence": "high"
}
},
"findings": [
{
"dimension": "cross_module_architecture",
"identifier": "status_contract_string_map_boundary",
"summary": "Scorecard state uses string keys instead of shared Status type, weakening module contracts.",
"related_files": [
"internal/quality/types.go",
"internal/quality/scoring_test.go",
"README.md"
],
"evidence": [
"`internal/quality/types.go` defines `Status` constants but `Scorecard.StatusByType` is `map[string]int`.",
"`internal/quality/scoring_test.go` asserts `card.StatusByType[\"open\"]` and `card.StatusByType[\"fixed\"]` directly.",
"README promises resolution-state tracking, but this boundary is not type-safe."
],
"suggestion": "Change `Scorecard.StatusByType` to `map[Status]int` (or a dedicated typed struct), update serialization adapters if needed, and update tests to assert using `StatusOpen`/`StatusFixed` constants.",
"confidence": "high",
"impact_scope": "subsystem",
"fix_scope": "multi_file_refactor"
},
{
"dimension": "test_strategy",
"identifier": "brittle_private_and_copy_assertions_in_quality_tests",
"summary": "Quality tests are tightly coupled to private helpers and exact copy text, reducing refactor safety.",
"related_files": [
"internal/quality/narrative_test.go",
"internal/quality/scoring_test.go"
],
"evidence": [
"`narrative_test.go` directly asserts exact strings for generated headlines/actions and tests helper internals rather than stable external behavior.",
"`scoring_test.go` anchors on specific internal weighting outputs and literal status strings, which can fail on benign internal redesigns."
],
"suggestion": "Shift to contract-level tests against exported APIs with invariant assertions (phase category, presence of required fields, monotonic score behavior), and keep only a small set of snapshot/copy tests for user-facing text.",
"confidence": "high",
"impact_scope": "module",
"fix_scope": "multi_file_refactor"
},
{
"dimension": "test_strategy",
"identifier": "rust_parser_missing_negative_and_boundary_cases",
"summary": "Rust parser tests miss malformed-input and degradation-path coverage.",
"related_files": [
"pkg/rustdocs/parser_test.go",
"README.md"
],
"evidence": [
"`parser_test.go` cases are successful parses with valid fixture HTML and only basic assertions.",
"No tests verify behavior for malformed HTML, missing selectors, empty documents, or unsupported result rows.",
"README positions docs ingestion as core functionality, so parser failure behavior is a critical path."
],
"suggestion": "Add table-driven negative tests for malformed/partial HTML, empty search blocks, and missing headings; assert stable fallback behavior (explicit error or safe zero-value output) for each parser entrypoint.",
"confidence": "high",
"impact_scope": "module",
"fix_scope": "single_edit"
}
]
}
@@ -0,0 +1,79 @@
{
"batch": "Design Coherence — Mechanical Concern Signals",
"batch_index": 4,
"assessments": {
"design_coherence": 66.0
},
"dimension_notes": {
"design_coherence": {
"evidence": [
"Parallel implementations of the same scorecard pipeline exist in `cmd/devour_scorecard.py` and `cmd/scorecard_generator.py` with near-identical function layouts (`ScorecardData`, `score_color`, `draw_left_panel`, `draw_right_panel`, `generate_scorecard`, `main`) and only minor line-level differences.",
"Three variants of enhanced generator (`cmd/devour_enhanced.py`, `cmd/devour_enhanced_fixed.py`, `cmd/devour_enhanced_v2.py`) repeat almost the full rendering stack (`draw_header_section`, `draw_enhanced_left_panel`, `draw_enhanced_right_panel`, `draw_trends_section`, `load_enhanced_devour_data`), creating branch-by-copy evolution.",
"Scraper adapters across providers (`internal/scraper/external/astrodocs.go`, `internal/scraper/external/cloudflaredocs.go`, `internal/scraper/external/reactdocs.go`) duplicate fetch/hash/change-detection and document assembly patterns with provider-specific data glued inline, indicating repeated structural pattern without shared orchestration abstraction.",
"Within `cmd/devour_lighthouse.py`, `load_font` is defined twice (once near top and again later), showing local design drift and utility ownership ambiguity."
],
"impact_scope": "codebase",
"fix_scope": "architectural_change",
"confidence": "high"
}
},
"findings": [
{
"dimension": "design_coherence",
"identifier": "scorecard_variant_sprawl",
"summary": "Scorecard generation is maintained as multiple copy-variants instead of one composable pipeline.",
"related_files": [
"cmd/devour_scorecard.py",
"cmd/scorecard_generator.py",
"cmd/devour_enhanced.py",
"cmd/devour_enhanced_fixed.py",
"cmd/devour_enhanced_v2.py"
],
"evidence": [
"Both `cmd/devour_scorecard.py` and `cmd/scorecard_generator.py` declare the same major functions and data model in the same order with only minor stylistic deltas.",
"Enhanced variants repeat the same section render functions and data loading flow, then diverge by ad-hoc edits, increasing change fan-out for any layout or scoring rule update."
],
"suggestion": "Extract a shared rendering core module (palette/fonts/layout primitives + data normalization), keep one canonical CLI entrypoint, and convert variant behavior into explicit theme/feature flags rather than duplicated files.",
"confidence": "high",
"impact_scope": "codebase",
"fix_scope": "architectural_change"
},
{
"dimension": "design_coherence",
"identifier": "external_scraper_template_duplication",
"summary": "Provider scrapers repeat the same orchestration flow with per-provider copy/paste adapters.",
"related_files": [
"internal/scraper/external/astrodocs.go",
"internal/scraper/external/cloudflaredocs.go",
"internal/scraper/external/reactdocs.go",
"internal/scraper/external/godocs.go",
"internal/scraper/external/vuedocs.go"
],
"evidence": [
"Each scraper reimplements nearly identical `Scrape`, `DetectChanges`, `fetchPage`, and `generateHash` scaffolding, then inlines provider-specific conversion methods.",
"The repeated constructor/client/parser wiring pattern appears across multiple files, indicating systemic pattern duplication rather than isolated differences."
],
"suggestion": "Introduce a shared `DocAdapter` contract and a generic `HTTPDocScraper` that owns fetch/hash/change-detect; keep provider files focused on mapping parsed domain objects to `Document`.",
"confidence": "high",
"impact_scope": "subsystem",
"fix_scope": "architectural_change"
},
{
"dimension": "design_coherence",
"identifier": "utility_ownership_drift_in_lighthouse_script",
"summary": "Duplicate utility definition in one file shows mixed responsibility boundaries.",
"related_files": [
"cmd/devour_lighthouse.py",
"cmd/devour_enhanced.py"
],
"evidence": [
"`cmd/devour_lighthouse.py` defines `load_font` twice with effectively the same fallback behavior, creating hidden override risk and unclear source of truth.",
"Comparable font utility exists in other renderer scripts, reinforcing that shared utility concerns are spread instead of centralized."
],
"suggestion": "Remove the duplicate in `cmd/devour_lighthouse.py` and move font-loading helpers into a shared module imported by all renderer scripts.",
"confidence": "high",
"impact_scope": "module",
"fix_scope": "multi_file_refactor"
}
]
}
@@ -0,0 +1,79 @@
{
"batch": "Cross-cutting Sweep",
"batch_index": 5,
"assessments": {
"error_consistency": 71.0
},
"dimension_notes": {
"error_consistency": {
"evidence": [
"Raw error passthrough is common in core flows (e.g., `return nil, err` in `internal/search/engine.go:114`, `internal/search/engine.go:122`, `internal/scraper/openapi.go:45`, `internal/scraper/openapi.go:50`) while nearby code wraps with operation context (e.g., `internal/search/engine.go:111`, `internal/scraper/openapi.go:153`).",
"Failure handling style diverges between aborting, propagating, and suppressing in similar backend paths: `panic(...)` in `internal/quality/plugins/go/plugin.go:363`, warning print-and-continue in `internal/indexer/indexer.go:239`, and plain returns in `cmd/scrape.go:90`/`cmd/get.go:59`.",
"Some call paths lose caller context at command boundaries (`cmd/scrape.go:90`, `cmd/scrape.go:125`, `cmd/get.go:59`) despite contextual wrapping being used in other command-layer branches (`cmd/scrape.go:131`, `cmd/scrape.go:145`)."
],
"impact_scope": "subsystem",
"fix_scope": "multi_file_refactor",
"confidence": "high"
}
},
"findings": [
{
"dimension": "error_consistency",
"identifier": "mixed_error_wrapping_in_scrape_and_search_paths",
"summary": "Related scrape/search paths mix raw passthrough and contextual wrapping.",
"related_files": [
"internal/search/engine.go",
"internal/scraper/openapi.go",
"internal/scraper/localsearch.go",
"cmd/scrape.go"
],
"evidence": [
"`internal/search/engine.go` frequently returns raw errors (`:114`, `:117`, `:122`, `:170`) but also uses contextual errors (`:111`, `:230`).",
"`internal/scraper/openapi.go` propagates raw errors from `readSpec`/`parseOpenAPISpec` (`:45`, `:50`, `:123`, `:141`, `:149`, `:157`, `:164`) while also defining wrapped errors (`:135`, `:153`, `:217`).",
"`internal/scraper/localsearch.go` returns raw errors from helper boundaries (`:79`, `:164`, `:191`, `:222`) mixed with rich wrapped messages in the same workflow (`:196`, `:203`, `:209`, `:217`)."
],
"suggestion": "Define a package-level rule: public methods must wrap downstream errors with operation context (using `%w`), and helper internals may return raw errors. Apply this consistently to `Rebuild/EnsureIndexed`, `OpenAPIScraper.Scrape/DetectChanges/readSpec`, and `LocalSearchScraper` methods.",
"confidence": "high",
"impact_scope": "subsystem",
"fix_scope": "multi_file_refactor"
},
{
"dimension": "error_consistency",
"identifier": "inconsistent_failure_channel_panic_vs_error_vs_warning",
"summary": "Failure signaling varies between panic, error return, and warning-only logging.",
"related_files": [
"internal/quality/plugins/go/plugin.go",
"internal/indexer/indexer.go",
"cmd/scrape.go"
],
"evidence": [
"`internal/quality/plugins/go/plugin.go:363` panics on plugin registration failure.",
"`internal/indexer/indexer.go:239` prints a warning and suppresses deletion errors instead of returning them.",
"`cmd/scrape.go` is structured around returned errors (`:131`, `:145`, `:207`) and has no panic-based handling, creating inconsistent contracts across subsystems."
],
"suggestion": "Standardize on explicit error returns for recoverable startup/runtime failures; replace plugin `panic` with registration error propagation or controlled process-exit at the command entrypoint, and make indexer deletion behavior explicit (either aggregate and return partial-failure errors or document/encode best-effort mode).",
"confidence": "high",
"impact_scope": "codebase",
"fix_scope": "architectural_change"
},
{
"dimension": "error_consistency",
"identifier": "command_boundary_context_loss",
"summary": "CLI command boundaries sometimes return raw errors without command context.",
"related_files": [
"cmd/get.go",
"cmd/scrape.go",
"internal/config/config.go"
],
"evidence": [
"`cmd/get.go:59` and `cmd/scrape.go:90`/`:125` return raw errors directly from downstream calls.",
"Other branches in the same command wrap with explicit context (`cmd/scrape.go:131`, `cmd/scrape.go:145`, `cmd/scrape.go:154`).",
"Config layer already emits contextual wrapped errors (`internal/config/config.go:177`, `internal/config/config.go:181`), so command-layer inconsistency creates uneven user-facing diagnostics."
],
"suggestion": "At CLI entrypoints, wrap all returned downstream errors with command/action context (e.g., `run get`, `load config`, `scrape source`) and preserve root cause with `%w`; keep user-readable validation errors as direct messages.",
"confidence": "high",
"impact_scope": "module",
"fix_scope": "multi_file_refactor"
}
]
}
@@ -0,0 +1,167 @@
{
"batch": "Full Codebase Sweep",
"batch_index": 6,
"assessments": {
"cross_module_architecture": 74.0,
"error_consistency": 68.0,
"abstraction_fitness": 62.0,
"test_strategy": 55.0,
"design_coherence": 64.0
},
"dimension_notes": {
"cross_module_architecture": {
"evidence": [
"`cmd/root.go` relies on blank import `_ \"github.com/yourorg/devour/internal/scraper/external\"` to activate runtime registration side effects.",
"`internal/scraper/external/register.go` mutates global scraper registry in `init()` for all language scrapers.",
"`internal/scraper/registry_simple.go` uses global mutable registry (`globalRegistry`) shared process-wide, increasing hidden coupling and order sensitivity."
],
"impact_scope": "subsystem",
"fix_scope": "architectural_change",
"confidence": "high"
},
"error_consistency": {
"evidence": [
"`cmd/root.go` and `cmd/scorecard.go` terminate with `os.Exit`, while most command flows return wrapped errors.",
"`internal/quality/plugins/go/plugin.go` panics during plugin registration (`panic(fmt.Sprintf(...))`) instead of surfacing an error contract.",
"`cleanup_unused.go` uses `log.Fatal` and mixed logging/exit style unlike the rest of the codebase's `fmt.Errorf(...%w...)` propagation."
],
"impact_scope": "codebase",
"fix_scope": "multi_file_refactor",
"confidence": "high"
},
"abstraction_fitness": {
"evidence": [
"Language-specific scraper implementations (`internal/scraper/external/godocs.go`, `internal/scraper/external/rustdocs.go`, `internal/scraper/external/reactdocs.go`, and peers) repeat near-identical HTTP fetch/hash/error scaffolding with only parser/document mapping differences.",
"`internal/scraper/external/types.go` is a thin alias layer over `internal/scraper` types and does not enforce additional policy or invariants.",
"High repeated constructor/scrape boilerplate across external scrapers indicates abstraction cost is paid repeatedly without shared leverage."
],
"impact_scope": "subsystem",
"fix_scope": "architectural_change",
"confidence": "high",
"sub_axes": {
"abstraction_leverage": 58.0,
"indirection_cost": 72.0,
"interface_honesty": 70.0
}
},
"test_strategy": {
"evidence": [
"Critical runtime surfaces have no direct tests: `internal/server/server.go`, `pkg/client/client.go`, `internal/vector/store.go`, `internal/search/engine.go`, `internal/indexer/indexer.go`.",
"`pkg/client/client.go` has TODO stubs returning `nil, nil` for `Query` and `Status`, but there are no tests asserting failure behavior or contract correctness.",
"`internal/server/server.go` Start methods are TODO/no-op `nil` returns and remain unvalidated by tests, creating false-green behavior for unimplemented paths."
],
"impact_scope": "codebase",
"fix_scope": "multi_file_refactor",
"confidence": "high"
},
"design_coherence": {
"evidence": [
"`cmd/quality.go` (695 LOC) mixes CLI wiring, scan orchestration, status persistence, scoring output, resolution updates, fixer execution, and review import/export in one file.",
"`cmd/scrape.go` (444 LOC) combines source parsing, source-type inference, profile heuristics, scrape orchestration, persistence, and source-state hashing.",
"These large command files show recurring multi-responsibility patterns rather than cohesive command/use-case units."
],
"impact_scope": "module",
"fix_scope": "multi_file_refactor",
"confidence": "high"
}
},
"findings": [
{
"dimension": "cross_module_architecture",
"identifier": "init_side_effect_registration_coupling",
"summary": "Scraper registration depends on import-time side effects and global mutable registry state.",
"related_files": [
"cmd/root.go",
"internal/scraper/external/register.go",
"internal/scraper/registry_simple.go"
],
"evidence": [
"Blank import in root command triggers registration implicitly rather than explicit bootstrap wiring.",
"Registration happens in `init()` and mutates shared global registry."
],
"suggestion": "Replace import-time registration with explicit bootstrap registration (e.g., `RegisterExternalScrapers()` called from startup), and pass registry instances through constructors to remove hidden global coupling.",
"confidence": "high",
"impact_scope": "subsystem",
"fix_scope": "architectural_change"
},
{
"dimension": "error_consistency",
"identifier": "mixed_process_termination_and_error_propagation",
"summary": "Error handling mixes panic/log.Fatal/os.Exit with returned errors across adjacent layers.",
"related_files": [
"cmd/root.go",
"cmd/scorecard.go",
"internal/quality/plugins/go/plugin.go",
"cleanup_unused.go"
],
"evidence": [
"`Execute()` exits process directly; scorecard helper exits inside utility flow; plugin registration panics on failure.",
"Most other command paths return wrapped errors, creating inconsistent failure semantics."
],
"suggestion": "Standardize on returning errors from library/command internals and only perform process exit in one top-level entrypoint; replace panic/log.Fatal in shared code with typed/wrapped errors.",
"confidence": "high",
"impact_scope": "codebase",
"fix_scope": "multi_file_refactor"
},
{
"dimension": "abstraction_fitness",
"identifier": "external_scraper_boilerplate_without_shared_base",
"summary": "External scraper implementations duplicate fetch/hash/error/document plumbing instead of sharing a base abstraction.",
"related_files": [
"internal/scraper/external/godocs.go",
"internal/scraper/external/rustdocs.go",
"internal/scraper/external/reactdocs.go",
"internal/scraper/external/nuxtdocs.go",
"internal/scraper/external/cloudflaredocs.go",
"internal/scraper/external/types.go"
],
"evidence": [
"Each scraper repeats `fetchPage`, status code checks, hash generation, and near-identical scrape control flow.",
"Alias-only types file adds indirection without behavior."
],
"suggestion": "Introduce a shared external-scraper base/helper for HTTP fetch, retries, hashing, and common error mapping; keep only parser-specific extraction and document shaping in each language scraper.",
"confidence": "high",
"impact_scope": "subsystem",
"fix_scope": "architectural_change"
},
{
"dimension": "test_strategy",
"identifier": "untested_unimplemented_runtime_paths",
"summary": "Core runtime paths are both under-tested and partially stubbed, leaving high-risk behavior unvalidated.",
"related_files": [
"internal/server/server.go",
"pkg/client/client.go",
"internal/vector/store.go",
"internal/search/engine.go",
"internal/indexer/indexer.go"
],
"evidence": [
"Server/client/store contain TODO or not-implemented branches without direct tests.",
"No direct test files exist for several core modules that govern querying, indexing, and serving."
],
"suggestion": "Add table-driven tests for client/server/store/indexer contracts first (error behavior and non-nil results), then implement missing paths behind those tests; prioritize integration tests that exercise scrape->index->query flow.",
"confidence": "high",
"impact_scope": "codebase",
"fix_scope": "multi_file_refactor"
},
{
"dimension": "design_coherence",
"identifier": "command_files_mix_multiple_responsibilities",
"summary": "Large CLI command files blend orchestration, domain logic, persistence, and formatting concerns.",
"related_files": [
"cmd/quality.go",
"cmd/scrape.go",
"cmd/ask.go"
],
"evidence": [
"`cmd/quality.go` combines scan setup, scoring/status persistence, resolve/fix/review workflows.",
"`cmd/scrape.go` combines config parsing, source detection/profiling, scrape execution, indexing, and source-state updates.",
"`cmd/ask.go` includes query derivation, source URL heuristics, ranking, summarization, and output formatting in one command module."
],
"suggestion": "Split command files into focused packages (transport/CLI binding vs service layer vs persistence helpers) and keep Cobra handlers as thin adapters invoking composable use-case functions.",
"confidence": "high",
"impact_scope": "module",
"fix_scope": "multi_file_refactor"
}
]
}
+158
View File
@@ -0,0 +1,158 @@
<!-- desloppify-begin -->
<!-- desloppify-skill-version: 1 -->
---
name: desloppify
description: >
Codebase health scanner and technical debt tracker. Use when the user asks
about code quality, technical debt, dead code, large files, god classes,
duplicate functions, code smells, naming issues, import cycles, or coupling
problems. Also use when asked for a health score, what to fix next, or to
create a cleanup plan. Supports 28 languages.
allowed-tools: Bash(desloppify *)
---
# Desloppify
## 1. Your Job
**Improve code quality by fixing findings and maximizing strict score honestly.**
Never hide debt with suppression patterns just to improve lenient score. After
every scan, show the user ALL scores:
| What | How |
|------|-----|
| Overall health | lenient + strict |
| 5 mechanical dimensions | File health, Code quality, Duplication, Test health, Security |
| 7 subjective dimensions | Naming Quality, Error Consistency, Abstraction Fit, Logic Clarity, AI Generated Debt, Type Safety, Contract Coherence |
Never skip scores. The user tracks progress through them.
## 2. Core Loop
```
scan → follow the tool's strategy → fix or wontfix → rescan
```
1. `desloppify scan --path .` — the scan output ends with **INSTRUCTIONS FOR AGENTS**. Follow them. Don't substitute your own analysis.
2. Fix the issue the tool recommends.
3. `desloppify resolve fixed "<id>"` — or if it's intentional/acceptable:
`desloppify resolve wontfix "<id>" --note "reason why"`
4. Rescan to verify.
**Wontfix is not free.** It lowers the strict score. The gap between lenient and strict IS wontfix debt. Call it out when:
- Wontfix count is growing — challenge whether past decisions still hold
- A dimension is stuck 3+ scans — suggest a different approach
- Auto-fixers exist for open findings — ask why they haven't been run
## 3. Commands
```bash
desloppify scan --path src/ # full scan
desloppify scan --path src/ --reset-subjective # reset subjective baseline to 0, then scan
desloppify next --count 5 # top priorities
desloppify show <pattern> # filter by file/detector/ID
desloppify plan # prioritized plan
desloppify fix <fixer> --dry-run # auto-fix (dry-run first!)
desloppify move <src> <dst> --dry-run # move + update imports
desloppify resolve fixed|wontfix|false_positive "<pat>" # classify finding outcome
desloppify review --prepare # generate subjective review data
desloppify review --import file.json # import review results
```
## 4. Subjective Reviews (biggest score lever)
Score = 40% mechanical + 60% subjective. Subjective starts at 0% until reviewed.
1. `desloppify review --prepare` — writes dimension definitions and codebase context
to `query.json`.
2. **Review each dimension independently.** For best results, review dimensions in
isolation so scores don't bleed across concerns. If your agent supports parallel
execution, use it — your agent-specific overlay (appended below, if installed)
has the optimal approach. Each reviewer needs:
- The codebase path and the dimensions to score
- What each dimension means (from `query.json`'s `dimension_prompts`)
- The output format (below)
- Nothing else — let them decide what to read and how
3. Merge assessments (average scores if multiple reviewers cover the same dimension)
and findings, then import:
```bash
desloppify review --import findings.json
```
Required output format per reviewer:
```json
{
"assessments": { "naming_quality": 75.0, "logic_clarity": 82.0 },
"findings": [{
"dimension": "naming_quality",
"identifier": "short_id",
"summary": "one line",
"related_files": ["path/to/file.py"],
"evidence": ["specific observation"],
"suggestion": "concrete action"
}]
}
```
Need a clean subjective rerun from zero? Run `desloppify scan --path src/ --reset-subjective` before preparing/importing fresh review data.
Even moderate scores (60-80) dramatically improve overall health.
Integrity safeguard:
- If one subjective dimension lands exactly on the strict target, the scanner warns and asks for re-review.
- If two or more subjective dimensions land on the strict target in the same scan, those dimensions are auto-reset to 0 for that scan and must be re-reviewed/imported.
- Reviewers should score from evidence only (not from target-seeking).
## 5. Quick Reference
- **Tiers**: T1 auto-fix, T2 quick manual, T3 judgment call, T4 major refactor
- **Zones**: production/script (scored), test/config/generated/vendor (not scored). Fix with `zone set`.
- **Auto-fixers** (TS only): `unused-imports`, `unused-vars`, `debug-logs`, `dead-exports`, etc.
- **query.json**: After any command, has `narrative.actions` with prioritized next steps.
- `--skip-slow` skips duplicate detection for faster iteration.
- `--lang python`, `--lang typescript`, or `--lang csharp` to force language.
- C# defaults to `--profile objective`; use `--profile full` to include subjective review.
- Score can temporarily drop after fixes (cascade effects are normal).
## 6. Escalate Tool Issues Upstream
When desloppify itself appears wrong or inconsistent:
1. Capture a minimal repro (`command`, `path`, `expected`, `actual`).
2. Open a GitHub issue in `peteromallet/desloppify`.
3. If you can fix it safely, open a PR linked to that issue.
4. If unsure whether it is tool bug vs user workflow, issue first, PR second.
## Prerequisite
`command -v desloppify >/dev/null 2>&1 && echo "desloppify: installed" || echo "NOT INSTALLED — run: pip install --upgrade git+https://github.com/peteromallet/desloppify.git"`
<!-- desloppify-end -->
## Codex Overlay
This is the canonical Codex overlay used by the README install command.
1. Prefer first-class batch runs: `desloppify review --run-batches --runner codex --parallel`.
2. The command writes immutable packet snapshots under `.desloppify/review_packets/holistic_packet_*.json`; use those for reproducible retries.
3. Keep reviewer input scoped to the immutable packet and the source files named in each batch.
4. Do not use prior chat context, score history, narrative summaries, issue labels, or target-threshold anchoring while scoring.
5. Assess every dimension listed in `query.dimensions`; never drop a requested dimension. If evidence is weak/mixed, score lower and explain uncertainty in findings.
6. Return machine-readable JSON only for review imports:
```json
{
"assessments": {
"<dimension_from_query>": 0
},
"findings": []
}
```
7. Keep `findings` schema compatible with `query.system_prompt`.
8. If a batch fails, retry only that slice with `desloppify review --run-batches --packet <packet.json> --only-batches <idxs>`.
<!-- desloppify-overlay: codex -->
<!-- desloppify-end -->
+98 -20
View File
@@ -1,33 +1,111 @@
# Devour # Contributing to Devour
Devour is a context ingestion and management system for AI that scrapes, indexes, and serves documentation from multiple sources. Thanks for considering a contribution.
## Installation Devour is moving fast, and practical contributions (bug fixes, docs cleanup, tests, stability work) are especially valuable right now.
## Before You Start
- Check existing issues/PRs to avoid duplicate work.
- If the change is non-trivial, open an issue first to align on direction.
- Keep PRs focused. Small, reviewable changes get merged faster.
## Local Setup
```bash ```bash
go install github.com/yourorg/devour/cmd/devour@latest git clone <your-fork-or-repo-url>
cd devour
go mod download
go build -o devour ./cmd/devour
``` ```
## Quick Start Optional sanity check:
```bash ```bash
# Initialize ./devour --help
devour init
# Scrape documentation
devour scrape https://docs.example.com
# Query indexed docs
devour query "authentication flow"
# Start MCP server
devour serve
``` ```
## Documentation ## Branch and Commit Workflow
See [README.md](README.md) for full documentation. 1. Fork the repo.
2. Create a branch from `main`:
`git checkout -b feat/short-description`
3. Make your changes.
4. Run tests/checks.
5. Commit with a clear message.
6. Open a PR.
## License Commit style suggestion:
MIT License - `feat: add xyz`
- `fix: resolve panic in abc`
- `docs: clarify quick start`
- `test: add coverage for module`
## What To Test
At minimum, run:
```bash
go test ./...
```
If your changes affect CLI behavior, also run relevant commands directly (for example `devour get`, `devour ask`, `devour quality status`).
If you touch docs, verify commands and paths are real.
## Current Known Limitations
Please keep these in mind when proposing fixes or documentation:
- Remote workflows are still experimental:
- `devour serve --remote` is available, but local stdio JSON-RPC is the primary stable mode.
- remote `push` flows are intentionally disabled in stable behavior.
- Live scraping quality can vary with upstream site changes; use `devour verify smoke` before releases.
- Language support is broad, but extraction quality differs by source type and site structure.
- Quality analyzers are strongest for Go code; cross-language analysis remains limited.
PRs that improve these areas are highly appreciated.
## Pull Request Checklist
Before opening a PR, confirm:
- [ ] Change is scoped and clearly described.
- [ ] Tests pass locally (`go test ./...`).
- [ ] New behavior includes tests when feasible.
- [ ] Docs updated for user-facing changes.
- [ ] No unrelated refactors mixed in.
## Coding Guidelines
- Follow existing Go style in the touched package.
- Prefer clear, simple control flow over clever abstractions.
- Add comments only when they explain non-obvious intent.
- Preserve current CLI UX unless there is a strong reason to change it.
## Reporting Bugs
When filing a bug, include:
- Devour command used
- exact error output
- OS and Go version
- minimal reproduction steps
Bug reports with reproducible steps are fixed much faster.
## Documentation Contributions
Docs improvements are first-class contributions.
Useful docs PRs include:
- fixing incorrect command examples
- clarifying status of in-progress features
- adding troubleshooting notes
- improving onboarding for first-time contributors
## Questions
If something is unclear, open an issue and ask directly. It is better to align early than rework a large PR later.
+111 -549
View File
@@ -1,593 +1,155 @@
<p align="center"> <p align="center">
<img src="devour_logo.svg" alt="Devour Logo" width="300"> <img src="devour_logo.svg" alt="Devour Logo" width="220">
</p> </p>
<h1 align="center">Devour</h1> <h1 align="center">Devour</h1>
<p align="center">Feed your AI real docs so it stops repeating the same mistakes.</p>
<p align="center"> ## Why Devour exists
<strong>Context Ingestion & Management for AI</strong> I built Devour because AI tools kept drifting away from official docs, then repeating wrong patterns in later prompts.
</p>
<p align="center"> Devour fixes that loop with a local-first workflow:
<a href="#features">Features</a> • 1. scrape official docs,
<a href="#installation">Installation</a> • 2. keep them in your project,
<a href="#quick-start">Quick Start</a> • 3. search and ask against that local corpus,
<a href="#architecture">Architecture</a> • 4. sync updates when sources change.
<a href="#cli-reference">CLI Reference</a> •
<a href="#configuration">Configuration</a>
</p>
--- ## What works today
- `devour init` creates a complete local workspace
- `devour scrape` supports URL, language-specific docs, local files, GitHub repos, OpenAPI specs, and `--sources`
- `devour get` supports language/framework shortcuts (now including Next.js, Svelte, Angular, Remix, SolidJS, Express)
- `devour query` is now functional (local lexical index)
- `devour ask` is hybrid local-first with live fallback when local relevance is weak
- `devour sync` updates configured sources and reindexes
- `devour status` reports real local health/statistics
- `devour push <path>` imports local docs into the workspace and reindexes
- `devour serve` local mode runs JSON-RPC over stdio (`devour_query`, `devour_status`, `devour_scrape`, `devour_ask`, `devour_sync`)
- `devour auto` routes natural-language intent to the right command
- `devour verify smoke` runs opt-in live checks and writes reports
## What is Devour? ## Experimental
- `devour serve --remote` is available as an experimental HTTP RPC mode
Devour is a **context ingestion and management system** designed to feed structured, relevant context to AI models for generating accurate, fully working code. - remote push workflows are intentionally not enabled as stable behavior yet
It scrapes, indexes, and serves documentation from multiple sources:
- GitHub repositories
- OpenAPI/Swagger specifications
- Markdown/HTML documentation sites
- JSON/YAML schemas
- Local project files
### Two Modes of Operation
| Mode | Description | Use Case |
|------|-------------|----------|
| **Local** | Runs as an OpenCode skill on your machine | Single developer, offline work |
| **Remote** | MCP server hosted on your infrastructure | Teams, multi-project support |
---
## Features
### 🕷️ Multi-Source Scraping
- **GitHub** - Clone and parse repos, extract README, docs, code structure
- **OpenAPI** - Parse Swagger specs into structured endpoints
- **Web Docs** - Crawl documentation sites with Colly
- **Local Files** - Index your project's docs folder
### 🧠 Intelligent Indexing
- Vector embeddings via OpenAI (text-embedding-3-small/large)
- Semantic similarity search for context retrieval
- Metadata tracking (source, timestamp, file type)
### 🔄 Automatic Updates
- Configurable scheduler (default: every 3 days)
- Content hash comparison for change detection
- Automatic re-indexing on updates
### 🔌 MCP Integration
- Exposes context via Model Context Protocol
- **Local mode**: stdio transport (OpenCode skill)
- **Remote mode**: HTTP/SSE transport (MCP server)
### 💾 Flexible Storage
```
devour_data/
├── docs/ # Raw scraped documents
├── index/ # Vector embeddings
└── metadata/ # Source tracking & timestamps
```
### 📊 Quality Scorecard
Devour includes a built-in code quality analysis system that generates comprehensive scorecards for your project.
#### Three Scorecard Versions
**1. Compact Scorecard** - Quick overview with 3 key metrics
![Compact Scorecard](examples/scorecard_compact_light.png)
**2. Detailed Scorecard** - Comprehensive breakdown with charts and analytics
![Detailed Scorecard](examples/scorecard_detailed_light.png)
**3. Original Scorecard** - Classic balanced view
![Original Scorecard](examples/scorecard_original_light.png)
#### Usage
## Quick start
### 1) Build
```bash ```bash
# Run quality analysis with default (original) scorecard
devour quality scan
# Generate compact scorecard
devour quality scan --badge-path scorecard_compact.png --format compact
# Generate detailed scorecard
devour quality scan --badge-path scorecard_detailed.png --format detailed
# Generate dark theme versions
devour quality scan --badge-path scorecard_dark.png --theme dark
```
#### Features
- **Multi-theme support** - Light and dark themes
- **Three formats** - Compact, detailed, and original layouts
- **Real scan data** - Analyzes actual code quality issues
- **Multi-language support** - Go, Python, JavaScript, TypeScript, Java, Rust
- **Severity-based scoring** - T1 (auto-fixable) to T4 (major refactor)
- **Technical debt tracking** - Track improvements over time
- **Comprehensive metrics** - Complexity, duplication, security, coverage, and more
#### Score Metrics
- **Overall Score** - General code health (0-100%)
- **Strict Score** - Conservative scoring ignoring quick wins
- **Grade** - Letter grade (A-F) based on overall score
- **Findings by Type** - Issues grouped by category
- **Findings by Severity** - Issues grouped by impact level
- **Dimension Breakdown** - Detailed analysis per quality dimension
---
## Installation
### Prerequisites
- Go 1.22+
- OpenAI API key (for embeddings)
### From Source
```bash
# Clone the repository
git clone https://github.com/yourorg/devour.git
cd devour
# Install dependencies
go mod download go mod download
# Build
go build -o devour ./cmd/devour go build -o devour ./cmd/devour
# Install globally
go install ./cmd/devour
``` ```
### Quick Install ### 2) Initialize
```bash ```bash
go install github.com/yourorg/devour/cmd/devour@latest ./devour init
``` ```
--- ### 3) Get docs fast
## Quick Start
### 1. Initialize a Project
```bash ```bash
# Create devour config in current directory ./devour get go net/http --format markdown
devour init ./devour get nextjs routing
./devour get express middleware
# Or specify a path
devour init ./my-project
``` ```
### 2. Get Documentation (NEW!) ### 4) Search locally
```bash ```bash
# Quick access to popular language/framework docs ./devour query "json unmarshal"
devour get go http # Go HTTP package
devour get python asyncio # Python asyncio module
devour get react hooks # React Hooks documentation
devour get docker compose # Docker Compose docs
devour get rust tokio # Rust Tokio crate
devour get spring boot # Spring Boot framework
# Enhanced markdown output
devour get go http --format markdown
``` ```
**Supported Languages:** ### 5) Ask a docs-grounded question
- `go`, `golang` - Go packages (pkg.go.dev)
- `rust` - Rust crates (docs.rs)
- `python`, `py` - Python modules (docs.python.org)
- `java` - Java packages (docs.oracle.com)
- `spring` - Spring Boot (docs.spring.io)
- `typescript`, `ts` - TypeScript (typescriptlang.org)
- `react` - React (react.dev)
- `vue` - Vue.js (vuejs.org)
- `nuxt` - Nuxt (nuxt.com)
- `docker` - Docker (docs.docker.com)
- `cloudflare`, `cf` - Cloudflare (developers.cloudflare.com)
- `astro` - Astro (docs.astro.build)
### 3. Scrape Documentation
```bash ```bash
# Scrape from a URL ./devour ask --lang go "how to parse json" --format text
devour scrape https://docs.example.com
# Scrape a GitHub repo
devour scrape https://github.com/org/repo
# Scrape local docs
devour scrape ./docs
# Multiple sources
devour scrape --sources sources.yaml
``` ```
### 4. Query Context ### 6) Let Devour route intent automatically
```bash ```bash
# Search indexed docs ./devour auto "how to use useEffect in react"
devour query "How do I authenticate with the API?" ./devour auto "https://pkg.go.dev/net/http"
# With options
devour query "authentication" --limit 5 --format json
``` ```
### 5. Start the Server ## Core commands
| Command | Purpose |
| --- | --- |
| `devour init [path]` | Create config + local storage directories |
| `devour get <language> <keyword>` | Shortcut fetch from curated official docs |
| `devour scrape <source>` | Scrape one source directly |
| `devour scrape --sources sources.yaml` | Scrape multiple configured sources |
| `devour query <text>` | Search local indexed docs |
| `devour ask --lang <lang> <question>` | Structured answer using local-first + live fallback |
| `devour sync` | Sync configured sources and rebuild index |
| `devour status` | Show docs/index/source health |
| `devour push <path>` | Import local docs into workspace |
| `devour serve` | Start local stdio JSON-RPC server |
| `devour auto "<intent>"` | Auto-route intent to command |
| `devour verify smoke` | Live smoke verification report |
| `devour quality ...` | Code quality scan/triage/fixes |
## Supported `get` / `ask` languages and frameworks
- Go (`go`, `golang`)
- Rust (`rust`)
- Python (`python`, `py`)
- Java (`java`)
- Spring (`spring`)
- TypeScript (`typescript`, `ts`)
- React (`react`)
- Vue (`vue`)
- Nuxt (`nuxt`)
- Docker (`docker`)
- Cloudflare (`cloudflare`, `cf`)
- Astro (`astro`)
- C# (`csharp`, `cs`)
- Kotlin (`kotlin`, `kt`)
- PHP (`php`)
- Ruby (`ruby`, `rb`)
- Elixir (`elixir`, `ex`)
- Next.js (`next`, `nextjs`)
- Svelte (`svelte`)
- Angular (`angular`, `ng`)
- Remix (`remix`)
- Solid (`solid`, `solidjs`) via `github.com/solidjs/solid-docs`
- Express (`express`, `expressjs`)
Run `devour languages` for examples, or `devour languages --format json` for automation.
## Config
Devour reads `devour.yaml` (or `--config`).
New additive sections:
- `indexing`: local lexical index defaults
- `verification`: live smoke timeout defaults
Starter config: `devour.example.yaml`.
## Real-world verification
Run live smoke checks (opt-in):
```bash ```bash
# Local MCP server (stdio transport) ./devour verify smoke
devour serve
# Remote MCP server (HTTP)
devour serve --remote --port 8080
``` ```
### 6. Check Status Reports are saved to `devour_data/verify/smoke-<timestamp>.json`.
```bash ## JSON-RPC local server
devour status `devour serve` (local mode) accepts JSON-RPC 2.0 methods:
``` - `devour_query`
- `devour_status`
--- - `devour_scrape`
- `devour_ask`
## Enhanced Features - `devour_sync`
### 🎯 Simplified Language Interface
The new `devour get` command provides instant access to documentation for popular languages and frameworks without needing to remember full URLs:
```bash
# Instead of: devour scrape https://pkg.go.dev/net/http
devour get go http
# Instead of: devour scrape https://react.dev/reference/react/hooks
devour get react hooks
# Instead of: devour scrape https://docs.docker.com/compose
devour get docker compose
```
### 📝 Rich Markdown Output
Enable enhanced markdown formatting for beautiful, structured documentation:
```bash
devour get go http --format markdown
```
**Features:**
- 📋 Document metadata tables
- 📑 Auto-generated table of contents
- 🎨 Enhanced typography with emoji indicators
- 🔗 Automatic link conversion
- 📚 Structured content sections
- 🏷️ Source attribution and timestamps
### 🧠 Smart Content Enhancement
The markdown formatter automatically:
- Converts plain URLs to clickable links
- Adds visual indicators for examples, notes, and warnings
- Fixes code block formatting
- Generates proper heading structure
- Creates document metadata tables
---
## Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ Devour System │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────┐ ┌──────────┐ ┌───────────┐ ┌──────────┐ │
│ │ Scraper │───▶│ Indexer │───▶│ Storage │───▶│ Server │ │
│ └─────────┘ └──────────┘ └───────────┘ └──────────┘ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌─────────┐ ┌──────────┐ ┌───────────┐ ┌──────────┐ │
│ │ GitHub │ │ OpenAI │ │ Vector DB │ │ MCP │ │
│ │ Web │ │ Embeds │ │ (chromem) │ │ Protocol │ │
│ │ Local │ │ │ │ │ │ │ │
│ └─────────┘ └──────────┘ └───────────┘ └──────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Scheduler │ │
│ │ (Auto-update every 3 days, configurable) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
```
### Data Flow
```
User Query → Devour Server → Embedding Generation → Vector Search
AI Response ← Context Chunks ← Top-K Relevant Docs ←───┘
```
---
## CLI Reference
### Commands
| Command | Description |
|---------|-------------|
| `devour init [path]` | Initialize Devour for a project |
| `devour get <language> <keyword>` | **NEW** Quick docs fetch for popular languages |
| `devour scrape <source>` | Scrape docs from URL, repo, or path |
| `devour serve` | Start MCP server (local or remote) |
| `devour query <text>` | Search indexed documentation |
| `devour status` | Show index stats and last update |
| `devour sync` | Fetch updates from all sources |
| `devour push <path>` | Push docs to remote MCP server |
### Flags
```bash
# Global flags
--config, -c Config file path (default: ./devour.yaml)
--verbose, -v Enable verbose logging
--quiet, -q Suppress non-error output
# scrape flags
--sources, -s YAML file with source definitions
--format, -f Output format: json, markdown (default: json)
--concurrency Parallel scraping workers (default: 10)
# serve flags
--remote Run as remote HTTP server
--port, -p HTTP port (default: 8080)
--host HTTP host (default: localhost)
# query flags
--limit, -l Max results (default: 5)
--format, -f Output: json, text, markdown
--threshold Similarity threshold (default: 0.7)
```
---
## Configuration
### devour.yaml
```yaml
# Devour Configuration
# Storage paths
storage:
docs_dir: ./devour_data/docs
index_dir: ./devour_data/index
metadata_dir: ./devour_data/metadata
# Embedding settings
embeddings:
provider: openai # openai, local
model: text-embedding-3-small
api_key: ${OPENAI_API_KEY} # Env var reference
# Vector database
vector_db:
type: chromem # chromem, weaviate, faiss
persist: true
# Scraping settings
scraper:
user_agent: "Devour/1.0"
timeout: 30s
retry_count: 3
concurrency: 10
rate_limit: 500ms
# Scheduler
scheduler:
enabled: true
interval: 72h # Every 3 days
check_method: hash # hash, timestamp
# Server settings
server:
mode: local # local, remote
port: 8080
host: localhost
# Sources (for sync)
sources:
- name: project-docs
type: url
url: https://docs.example.com
include: ["**/*.md", "**/*.html"]
exclude: ["**/api/**"]
- name: api-spec
type: openapi
url: https://api.example.com/openapi.json
- name: github-repo
type: github
repo: org/repo
branch: main
paths: ["docs/", "README.md"]
```
---
## API Reference
### MCP Tools (when running as server)
#### `devour_query`
Search indexed documentation for relevant context.
```json
{
"query": "How do I authenticate?",
"limit": 5,
"threshold": 0.7
}
```
#### `devour_add`
Add documents to the index.
```json
{
"documents": [
{
"content": "Document text...",
"metadata": {
"source": "https://...",
"type": "markdown"
}
}
]
}
```
#### `devour_status`
Get indexing status and statistics.
### REST API (remote mode)
```
GET /health # Health check
GET /status # Index statistics
POST /query # Search documents
POST /documents # Add documents
GET /documents # List documents
DELETE /documents/:id # Delete document
POST /sync # Trigger sync
```
---
## Integration Examples
### With OpenCode (Local Mode)
Add to your OpenCode skills:
```yaml
# ~/.opencode/skills.yaml
skills:
- name: devour
path: /path/to/devour
commands:
- devour serve
```
Then in OpenCode:
```
/devour query "authentication flow"
```
### With AI Applications
```go
import "github.com/yourorg/devour/pkg/client"
func main() {
client := client.New("http://localhost:8080")
results, err := client.Query(ctx, "How do I use the API?", 5)
if err != nil {
log.Fatal(err)
}
for _, r := range results {
fmt.Printf("Score: %.2f - %s\n", r.Score, r.Content[:100])
}
}
```
---
## Development ## Development
### Test
### Project Structure
```
devour/
├── cmd/devour/ # CLI entrypoint
│ └── main.go
├── internal/
│ ├── scraper/ # Scraping logic
│ ├── indexer/ # Embedding generation
│ ├── server/ # MCP server
│ ├── scheduler/ # Background updates
│ └── ai/ # AI integrations
├── pkg/
│ ├── client/ # Go client library
│ └── types/ # Shared types
├── devour_data/ # Default data directory
├── go.mod
├── Makefile
└── README.md
```
### Building
```bash ```bash
# Development build
go build -o devour ./cmd/devour
# Production build
CGO_ENABLED=0 go build -ldflags="-s -w" -o devour ./cmd/devour
# Run tests
go test ./... go test ./...
# Run with coverage
go test -cover ./...
``` ```
### Makefile Targets ### Typical integration loop
```bash ```bash
make build # Build binary ./devour init
make test # Run tests ./devour scrape https://pkg.go.dev/net/http --type godocs
make lint # Run linter ./devour query "http client"
make docker # Build Docker image ./devour ask --lang go "timeout example"
make install # Install locally ./devour sync
./devour status
``` ```
---
## Roadmap
- [ ] Local LLM support (Ollama, LocalAI)
- [ ] Multi-tenant support for remote mode
- [ ] Web UI for document management
- [ ] Git-based versioning for docs
- [ ] Plugin system for custom scrapers
- [ ] Reranking with cross-encoders
---
## Contributing
Contributions are welcome! Please read our [Contributing Guide](CONTRIBUTING.md) for details.
---
## License ## License
MIT (`LICENSE`).
MIT License - see [LICENSE](LICENSE) for details.
---
<p align="center">
<sub>Built with ❤️ for better AI context</sub>
</p>
+79 -632
View File
@@ -1,14 +1,10 @@
--- ---
name: devour name: devour
description: > description: >
Context ingestion and management system for AI. Scrapes, indexes, and serves Use this skill for Devour CLI workflows: scrape docs, get language docs,
documentation from GitHub repos, OpenAPI specs, web docs, and local files. query local index, ask docs-grounded questions, sync sources, run quality
Provides semantic search via vector embeddings to feed relevant context to triage, and verify live smoke checks. Trigger on: "devour", "docs to ai",
AI models. Runs in local mode (stdio) or remote mode (HTTP MCP server). "scrape docs", "ask docs", "query docs", "sync docs", "quality scan".
Supports automatic updates via configurable scheduler. Integrates with
OpenAI for embeddings and LLM context injection. Triggers on: "devour",
"scrape docs", "index documentation", "context for AI", "vector search docs",
"semantic search", "ingest documentation", "documentation to AI".
allowed-tools: allowed-tools:
- Read - Read
- Write - Write
@@ -19,637 +15,88 @@ allowed-tools:
- WebFetch - WebFetch
--- ---
# Devour — Context Ingestion Skill # Devour Skill
Comprehensive documentation scraping, indexing, and retrieval system for Use this skill when a task is explicitly about Devour CLI operations or troubleshooting Devour workflows.
feeding structured context to AI models. Orchestrates 5 specialized modules
and supports both local (stdio) and remote (HTTP) MCP modes. ## What Devour now supports
## Quick Reference - `devour init`
- `devour get`
| Command | What it does | - `devour scrape`
|---------|-------------| - `devour scrape --sources ...`
| `/devour init [path]` | Initialize Devour for a project | - `devour query`
| `/devour get <language> <keyword>` | **NEW** Quick docs fetch for popular languages | - `devour ask`
| `/devour scrape <source>` | Scrape docs from URL, GitHub, or local path | - `devour sync`
| `/devour serve` | Start MCP server (local or remote) | - `devour status`
| `/devour query <text>` | Search indexed documentation | - `devour push <path>` (local ingest)
| `/devour status` | Show index stats and health | - `devour serve` (local stdio JSON-RPC)
| `/devour sync` | Fetch updates from all configured sources | - `devour auto`
| `/devour push <path>` | Push docs to remote MCP server | - `devour verify smoke`
| `/devour sources` | Manage documentation sources | - `devour quality ...`
| `/devour quality scan [path]` | **NEW** Run code quality analysis |
| `/devour quality status` | **NEW** Show quality metrics and trends | Remote server/push workflows are experimental.
| `/devour quality next` | **NEW** Show next priority issue to fix |
## Fast routing
## Orchestration Logic
1. User gives URL/source: use `devour scrape`.
When the user invokes `/devour get <language> <keyword>`: 2. User gives language+topic: use `devour get`.
3. User asks a question: use `devour ask --lang ...`.
1. **Map language to base URL**: 4. User wants local search: use `devour query`.
- `go http``https://pkg.go.dev/http` 5. User wants updates from config: use `devour sync`.
- `python asyncio``https://docs.python.org/3/library/asyncio.html` 6. User wants automatic intent routing: use `devour auto`.
- `react hooks``https://react.dev/reference/react/hooks` 7. User wants confidence check: use `devour verify smoke`.
- `docker compose``https://docs.docker.com/compose`
## Reliable workflow
2. **Auto-detect source type** based on language:
- Go → `godocs` parser
- Python → `pythondocs` parser
- React → `reactdocs` parser
- Docker → `dockerdocs` parser
3. **Execute enhanced scrape** with pre-configured parameters:
- Automatic language-specific parsing
- Enhanced markdown formatting (if requested)
- Metadata extraction and enrichment
4. **Return structured documentation**:
- Rich markdown with TOC (if `--format markdown`)
- JSON with full metadata (default)
- Ready for AI context injection
When the user invokes `/devour scrape`:
1. **Detect source type** from URL/path:
- GitHub: `github.com/org/repo` → Clone, extract docs
- OpenAPI: Ends in `.json`/`.yaml` with OpenAPI spec → Parse endpoints
- Web: HTTP/HTTPS URL → Crawl with Colly
- Local: File path → Scan directory
2. **Scrape with appropriate parser**:
- Extract content (markdown, HTML, code structure)
- Clean and normalize text
- Extract metadata (title, headings, code blocks)
3. **Generate embeddings**:
- Chunk content appropriately (512-1024 tokens)
- Call OpenAI embedding API
- Store in vector database
4. **Update metadata**:
- Track source, timestamp, content hash
- Enable future update detection
When the user invokes `/devour query`:
1. Generate embedding for query text
2. Perform vector similarity search
3. Return top-K results with metadata
4. Optionally inject into AI context
## Enhanced Features
### 🎯 Language-Aware Documentation Access
The `devour get` command provides intelligent, language-specific documentation retrieval:
**Supported Languages & Mappings:**
- `go`, `golang` → Go packages (pkg.go.dev)
- `rust` → Rust crates (docs.rs)
- `python`, `py` → Python modules (docs.python.org)
- `java` → Java packages (docs.oracle.com)
- `spring` → Spring Boot (docs.spring.io)
- `typescript`, `ts` → TypeScript (typescriptlang.org)
- `react` → React (react.dev)
- `vue` → Vue.js (vuejs.org)
- `nuxt` → Nuxt (nuxt.com)
- `docker` → Docker (docs.docker.com)
- `cloudflare`, `cf` → Cloudflare (developers.cloudflare.com)
- `astro` → Astro (docs.astro.build)
**Usage Examples:**
```bash
/devour get go http # Go HTTP package docs
/devour get python asyncio # Python asyncio module
/devour get react hooks # React Hooks reference
/devour get docker compose # Docker Compose guide
/devour get rust tokio # Rust Tokio crate docs
```
### 📝 Rich Markdown Enhancement
When using `--format markdown`, Devour automatically enhances documentation:
**Auto-Generated Structure:**
- 📋 Document metadata tables (source, type, timestamp)
- 📑 Table of contents from headings
- 🎨 Visual indicators for important content
- 🔗 Automatic URL-to-link conversion
- 📚 Proper heading hierarchy
**Content Enhancement:**
- `Example:` → 💡 **Example:**
- `Note:` → 📝 **Note:**
- `Warning:` → ⚠️ **Warning:**
- `Important:` → ❗ **Important:**
- `TODO:` → 📋 **TODO:**
**Example Output Structure:**
```markdown
# Package Name
## 📋 Document Information
| Property | Value |
|----------|-------|
| **Source** | https://pkg.go.dev/http |
| **Type** | `godocs` |
| **Scraped** | 2026-02-19 12:30:00 |
## 📑 Table of Contents
- [Functions](#functions)
- [Types](#types)
- [Examples](#examples)
## 📚 Content
# Functions
💡 **Example:** Usage example here...
```
## Source Type Detection
| Pattern | Type | Parser |
|---------|------|--------|
| `github.com/*/*` | GitHub | Git clone + markdown parser |
| `*.json` + OpenAPI keys | OpenAPI | Swagger parser |
| `http://*`, `https://*` | Web | Colly crawler |
| `./path`, `/path` | Local | Directory scanner |
| `*.md`, `*.rst`, `*.txt` | File | Direct parse |
## Module Reference
### 1. Scraper Module (`internal/scraper`)
Responsible for fetching and parsing content from various sources.
**Supported sources:**
- GitHub repositories (clone, extract docs/, README.md)
- OpenAPI/Swagger specs (parse endpoints, schemas)
- HTML documentation sites (crawl, extract content)
- Markdown files (parse structure, code blocks)
- JSON/YAML configuration files
**Output format:**
```json
{
"id": "doc-uuid",
"source": "https://...",
"type": "markdown",
"title": "Document Title",
"content": "Extracted text...",
"metadata": {
"headings": ["H1", "H2"],
"code_blocks": ["go", "bash"],
"links": ["url1", "url2"]
},
"timestamp": "2025-01-15T10:00:00Z"
}
```
### 2. Indexer Module (`internal/indexer`)
Converts documents into vector embeddings for semantic search.
**Features:**
- OpenAI embedding integration (text-embedding-3-small/large)
- Intelligent chunking (512-1024 tokens, respect boundaries)
- Metadata preservation
- Batch processing for efficiency
**Chunking strategy:**
```go
type Chunk struct {
ID string
DocID string
Content string
Vector []float32
Metadata map[string]any
Position int // Position in original doc
}
```
### 3. Server Module (`internal/server`)
Exposes context via MCP protocol.
**Local mode (stdio):**
```
STDIN → JSON-RPC → Handler → Response → STDOUT
```
**Remote mode (HTTP):**
```
HTTP Request → Handler → Response → HTTP Response
```
**MCP Tools exposed:**
- `devour_query` - Semantic search
- `devour_add` - Add documents
- `devour_status` - Get stats
- `devour_sync` - Trigger update
**MCP Resources:**
- `devour://documents` - All indexed docs
- `devour://sources` - Configured sources
- `devour://stats` - Index statistics
### 4. Scheduler Module (`internal/scheduler`)
Manages automatic updates from configured sources.
**Default schedule:** Every 72 hours (3 days)
**Change detection methods:**
- Content hash comparison (default)
- Last-Modified timestamp
- ETag header
- Git commit hash (for repos)
**Configuration:**
```yaml
scheduler:
enabled: true
interval: 72h
check_method: hash
retry_count: 3
retry_delay: 1h
```
### 5. AI Module (`internal/ai`)
Handles AI integrations for embeddings and context injection.
**Supported providers:**
- OpenAI (primary)
- Ollama (local, planned)
- Custom endpoints
**Context injection format:**
```go
type Context struct {
Query string
Results []SearchResult
SystemPrompt string
}
func (c *Context) ToPrompt() string {
// Format for LLM consumption
}
```
## Configuration Schema
### devour.yaml
```yaml
# Core configuration
version: 1
# Storage paths
storage:
docs_dir: ./devour_data/docs
index_dir: ./devour_data/index
metadata_dir: ./devour_data/metadata
# Embedding configuration
embeddings:
provider: openai
model: text-embedding-3-small
dimensions: 1536
api_key: ${OPENAI_API_KEY}
batch_size: 100
# Vector database
vector_db:
type: chromem # chromem, weaviate, faiss
persist: true
similarity_metric: cosine
# Scraping configuration
scraper:
user_agent: "Devour/1.0 (+https://github.com/yourorg/devour)"
timeout: 30s
retry_count: 3
retry_delay: 5s
concurrency: 10
rate_limit: 500ms
max_depth: 3
cache_dir: ./devour_data/cache
# Scheduler configuration
scheduler:
enabled: true
interval: 72h
check_method: hash
on_startup: false
# Server configuration
server:
mode: local # local, remote
transport: stdio # stdio, http
host: localhost
port: 8080
cors:
enabled: false
origins: []
# Source definitions
sources:
- name: example-docs
type: url
url: https://docs.example.com
include:
- "**/*.md"
- "**/*.html"
exclude:
- "**/api/**"
- "**/legacy/**"
schedule: 24h # Override global schedule
- name: api-spec
type: openapi
url: https://api.example.com/openapi.json
schedule: 168h # Weekly
- name: internal-repo
type: github
repo: myorg/myrepo
branch: main
paths:
- docs/
- README.md
auth_token: ${GITHUB_TOKEN}
```
## Environment Variables
| Variable | Description | Default |
|----------|-------------|---------|
| `OPENAI_API_KEY` | OpenAI API key | Required |
| `DEVOUR_CONFIG` | Config file path | `./devour.yaml` |
| `DEVOUR_DATA_DIR` | Data directory | `./devour_data` |
| `GITHUB_TOKEN` | GitHub auth token | Optional |
| `DEVOUR_LOG_LEVEL` | Log level (debug, info, warn, error) | `info` |
| `DEVOUR_PORT` | Server port | `8080` |
## Code Quality Analysis
Devour includes comprehensive code quality analysis with three scorecard formats:
### Scorecard Types
**Compact Scorecard** - Quick overview with 3 circular metrics:
- Overall score (0-100%)
- Strict score (conservative metric)
- Letter grade (A-F)
**Detailed Scorecard** - Comprehensive breakdown featuring:
- Score breakdown by dimension with progress bars
- Findings grouped by type with visual charts
- Severity distribution with percentage circles
- Project metadata and timestamps
**Original Scorecard** - Balanced view with:
- Left panel: Project info and main scores
- Right panel: Dimension metrics in two-column layout
### Quality Commands
```bash ```bash
# Basic quality scan (generates original scorecard) devour init
devour quality scan devour get go net/http
devour query "http client timeout"
# Generate specific scorecard formats devour ask --lang go "how to parse json"
devour quality scan --format compact --badge-path compact.png devour sync
devour quality scan --format detailed --badge-path detailed.png devour status
# Dark theme support
devour quality scan --theme dark --badge-path dark_scorecard.png
# Quality status and trends
devour quality status
# Show next priority issue
devour quality next
# Export findings as JSON
devour quality scan --format json > findings.json
``` ```
### Quality Metrics ## Key behavior notes
**Dimensions Analyzed:** - `ask` is hybrid local-first with targeted live fallback.
- Complexity - Nested loops, excessive function calls - `query` is local lexical index; no API key required.
- Duplication - Code clones and near-duplicates - `scrape` fails by default when 0 docs are extracted (unless `--allow-empty`).
- Security - Vulnerabilities and anti-patterns - `serve` local mode uses JSON-RPC over stdio.
- Test Coverage - Unit test coverage analysis
- Dead Code - Unused functions, variables, imports
- Coupling - High coupling between modules
- Naming - Inconsistent naming conventions
**Severity Levels:** ## Supported language aliases
- **T1** - Auto-fixable (unused imports, debug logs)
- **T2** - Quick manual fixes (unused vars, dead exports)
- **T3** - Requires judgment (near-dupes, single-use abstractions)
- **T4** - Major refactor needed (god components, mixed concerns)
**Scoring:** - `go`, `golang`
- **Overall Score** - General code health (0-100%) - `rust`
- **Strict Score** - Conservative scoring ignoring quick wins - `python`, `py`
- **Grade** - Letter grade based on score ranges (A: 90-100%, B: 70-89%, etc.) - `java`
- `spring`
- `typescript`, `ts`
- `react`
- `vue`
- `nuxt`
- `docker`
- `cloudflare`, `cf`
- `astro`
- `csharp`, `cs`
- `kotlin`, `kt`
- `php`
- `ruby`, `rb`
- `elixir`, `ex`
- `next`, `nextjs`
- `svelte`
- `angular`, `ng`
- `remix`
- `solid`, `solidjs`
- `express`, `expressjs`
### Multi-Language Support ## Response expectations
- **Go** - Full AST analysis with go/parser When reporting command results:
- **Python** - AST analysis with ast module
- **JavaScript/TypeScript** - ESLint integration
- **Java** - JavaParser integration
- **Rust** - Synth integration (planned)
### Integration Examples 1. Show exact command(s) run.
2. Summarize key output.
```bash 3. Show output file locations.
# CI/CD Pipeline Integration 4. Call out limitations/experimental behavior.
devour quality scan --format json --threshold 70 5. Give the next command to continue.
if [ $? -ne 0 ]; then
echo "Quality gate failed - score below threshold"
exit 1
fi
# Generate all scorecard versions for documentation
devour quality scan --format original --badge-path docs/scorecard.png
devour quality scan --format compact --badge-path docs/scorecard_compact.png --theme light
devour quality scan --format detailed --badge-path docs/scorecard_detailed.png --theme dark
# Weekly quality tracking
devour quality scan --format json > weekly_$(date +%Y%m%d).json
```
## Quality Gates
Built-in validation rules:
- ⚠️ **WARNING** if document count < 10 (may be incomplete scrape)
- ⚠️ **WARNING** if average chunk size < 100 tokens (over-fragmented)
- 🛑 **HARD STOP** if embedding API fails (cannot index without vectors)
- 🛑 **HARD STOP** if storage is not writable (cannot persist)
## Output Formats
### Query Results (JSON)
```json
{
"query": "authentication",
"results": [
{
"id": "chunk-uuid",
"document_id": "doc-uuid",
"content": "Relevant text excerpt...",
"score": 0.89,
"source": "https://docs.example.com/auth",
"metadata": {
"title": "Authentication Guide",
"section": "Getting Started"
}
}
],
"total": 15,
"took_ms": 45
}
```
### Status Output
```
Devour Status
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Index Health: ✅ Healthy
Documents: 1,247 indexed
Chunks: 8,392 total
Vector Dimension: 1536
Last Updated: 2025-01-15 10:30:00
Storage Used: 124 MB
Sources (3):
✅ example-docs (234 docs, synced 2h ago)
✅ api-spec (12 docs, synced 1d ago)
⚠️ internal-repo (pending first sync)
Next Scheduled Sync: 2025-01-18 10:30:00
```
## Integration Patterns
### With OpenCode
```yaml
# In OpenCode session
> /devour init
> /devour scrape https://docs.myframework.com
> /devour serve
# In another terminal or session
> /devour query "how to handle authentication"
# Returns relevant context for AI
```
### With AI Assistant
```go
// AI assistant queries Devour automatically
func getRelevantContext(query string) string {
resp, _ := http.Post("http://localhost:8080/query",
"application/json",
bytes.NewReader([]byte(`{"query":"`+query+`"}`)))
var result QueryResponse
json.NewDecoder(resp.Body).Decode(&result)
// Inject into prompt
return formatContextForAI(result.Results)
}
```
### As MCP Tool
```json
// AI calls via MCP
{
"method": "tools/call",
"params": {
"name": "devour_query",
"arguments": {
"query": "API rate limiting",
"limit": 5
}
}
}
```
## Sub-Skills
This skill can delegate to specialized modules:
1. **devour-scrape** — Scraping operations
2. **devour-index** — Indexing and embeddings
3. **devour-query** — Search and retrieval
4. **devour-sync** — Synchronization tasks
5. **devour-serve** — Server management
## Error Handling
| Error | Cause | Resolution |
|-------|-------|------------|
| `E001` | OpenAI API error | Check API key, rate limits |
| `E002` | Source unreachable | Verify URL, check network |
| `E003` | Storage write failure | Check permissions, disk space |
| `E004` | Invalid source type | Use supported: url, github, openapi, local |
| `E005` | Index corruption | Rebuild index with `devour sync --rebuild` |
## Performance Tuning
### Scraping
```yaml
scraper:
concurrency: 20 # Parallel workers
rate_limit: 200ms # Between requests
timeout: 60s # Per request
```
### Indexing
```yaml
embeddings:
batch_size: 200 # API batch size
vector_db:
index_type: hnsw # Fast similarity search
m: 16 # HNSW connectivity
```
### Querying
```yaml
query:
ef_search: 64 # HNSW search depth
limit: 10 # Default result count
```
## Troubleshooting
### Common Issues
**Slow queries:**
- Increase `ef_search` for better recall
- Use smaller `limit` values
- Consider index type (HNSW vs Flat)
**API rate limits:**
- Reduce `batch_size`
- Add delays between batches
- Use caching
**Memory usage:**
- Reduce `concurrency`
- Process in smaller batches
- Use disk-backed storage
---
*Devour: Feed your AI the context it craves.*
+7
View File
@@ -0,0 +1,7 @@
version: 1
display_name: Devour Docs Ops
short_description: Route and execute Devour docs/query/ask/sync workflows.
default_prompt: |
Use Devour commands to fetch official docs, maintain a local docs index,
answer questions against docs, sync sources, run quality triage, and report
exact commands/results with file outputs.
+1012
View File
File diff suppressed because it is too large Load Diff
+144
View File
@@ -0,0 +1,144 @@
package cmd
import (
"strings"
"testing"
"github.com/yourorg/devour/internal/scraper"
)
func TestDeriveSearchTerms(t *testing.T) {
terms := deriveSearchTerms("go", "how to regex match http path")
if len(terms) == 0 {
t.Fatal("expected at least one derived search term")
}
joined := strings.Join(terms, ",")
if !strings.Contains(joined, "regexp") {
t.Fatalf("expected regexp term in %v", terms)
}
if !strings.Contains(joined, "net/http") {
t.Fatalf("expected net/http term in %v", terms)
}
}
func TestScoreDocument(t *testing.T) {
query := "regex match in go"
docTitleMatch := &scraper.Document{
Title: "Package regexp",
Content: "Use MustCompile and MatchString to match values.",
Type: "go-package",
URL: "https://pkg.go.dev/regexp",
}
docNoMatch := &scraper.Document{
Title: "Package archive/tar",
Content: "Read and write tar archives.",
Type: "go-package",
URL: "https://pkg.go.dev/archive/tar",
}
if scoreDocument(query, docTitleMatch) <= scoreDocument(query, docNoMatch) {
t.Fatal("expected regex-related document to have a higher score")
}
}
func TestExtractRecommendedAPI(t *testing.T) {
docs := []rankedDoc{
{
doc: &scraper.Document{
Title: "regexp.func MustCompile ¶",
URL: "https://pkg.go.dev/regexp",
Content: "re := regexp.MustCompile(`\\\\d+`)\nif re.MatchString(input) { fmt.Println(\"ok\") }",
},
},
}
apis := extractRecommendedAPI(docs)
if len(apis) == 0 {
t.Fatal("expected API extraction to return at least one call")
}
}
func TestExtractSnippet(t *testing.T) {
content := "The regexp package implements regular expression search. Use MustCompile for fixed patterns."
snippet := extractSnippet(content, []string{"regexp"})
if snippet == "" {
t.Fatal("expected non-empty snippet")
}
if !strings.Contains(strings.ToLower(snippet), "regexp") {
t.Fatalf("snippet should mention regexp, got: %q", snippet)
}
}
func TestCandidateDocURLs_FrameworkFallbacks(t *testing.T) {
next, err := candidateDocURLs("nextjs", "routing")
if err != nil {
t.Fatalf("candidateDocURLs(nextjs) error: %v", err)
}
if len(next) < 2 {
t.Fatalf("expected fallback URLs for nextjs, got %v", next)
}
if next[0] != "https://nextjs.org/docs/app/building-your-application/routing" {
t.Fatalf("unexpected primary nextjs URL: %q", next[0])
}
remix, err := candidateDocURLs("remix", "routes")
if err != nil {
t.Fatalf("candidateDocURLs(remix) error: %v", err)
}
if len(remix) == 0 || remix[0] != "https://v2.remix.run/docs/file-conventions/routes" {
t.Fatalf("unexpected remix candidate URLs: %v", remix)
}
solid, err := candidateDocURLs("solid", "router")
if err != nil {
t.Fatalf("candidateDocURLs(solid) error: %v", err)
}
if len(solid) == 0 || !strings.Contains(solid[0], "github.com/solidjs/solid-docs") {
t.Fatalf("unexpected solid candidate URLs: %v", solid)
}
}
func TestPrimaryQueryTokenSkipsQuestionWords(t *testing.T) {
token := primaryQueryToken("what does routing do in remix")
if token == "" {
t.Fatal("expected non-empty token")
}
if token == "what" || token == "does" {
t.Fatalf("expected informative token, got %q", token)
}
}
func TestDeriveSearchTermsSolidRouting(t *testing.T) {
terms := deriveSearchTerms("solid", "how to do routing in solid")
joined := strings.Join(terms, ",")
if !strings.Contains(joined, "solid-router") {
t.Fatalf("expected solid-router term in %v", terms)
}
if strings.Contains(joined, "signals") {
t.Fatalf("did not expect signals default for routing question, got %v", terms)
}
}
func TestShouldFallbackToLive(t *testing.T) {
strong := []rankedDoc{
{
doc: &scraper.Document{Title: "Routing Guide", Content: "routing with file based routes", URL: "https://nextjs.org/docs/routing"},
score: 2.2,
},
}
if shouldFallbackToLive(strong, []string{"routing"}) {
t.Fatal("expected strong local match to skip live fallback")
}
weak := []rankedDoc{
{
doc: &scraper.Document{Title: "Misc", Content: "unrelated", URL: "https://example.com"},
score: 0.1,
},
}
if !shouldFallbackToLive(weak, []string{"routing"}) {
t.Fatal("expected weak local match to trigger live fallback")
}
}
+181
View File
@@ -0,0 +1,181 @@
package cmd
import (
"encoding/json"
"fmt"
"net/url"
"os"
"os/exec"
"sort"
"strings"
"unicode"
"github.com/spf13/cobra"
)
var (
autoDryRun bool
autoJSON bool
autoLang string
)
var autoCmd = &cobra.Command{
Use: "auto <intent>",
Short: "Route natural-language intent to the best Devour command",
Long: `Auto-classify intent and run the best matching command (get/scrape/ask/quality).
Examples:
devour auto "how to parse json in go"
devour auto "https://pkg.go.dev/net/http"
devour auto "check code quality" --dry-run
devour auto "what is useEffect" --lang react`,
Args: cobra.MinimumNArgs(1),
RunE: runAuto,
}
func init() {
autoCmd.Flags().BoolVar(&autoDryRun, "dry-run", false, "print selected command without executing")
autoCmd.Flags().BoolVar(&autoJSON, "json", false, "output route decision as JSON")
autoCmd.Flags().StringVar(&autoLang, "lang", "", "optional language override for ask/get routes")
}
type autoDecision struct {
Intent string `json:"intent"`
Route string `json:"route"`
Reason string `json:"reason"`
Command []string `json:"command"`
}
func runAuto(cmd *cobra.Command, args []string) error {
intent := strings.TrimSpace(strings.Join(args, " "))
if intent == "" {
return fmt.Errorf("intent is required")
}
decision, err := classifyIntent(intent, strings.TrimSpace(autoLang))
if err != nil {
return err
}
if autoJSON {
enc := json.NewEncoder(cmd.OutOrStdout())
enc.SetIndent("", " ")
return enc.Encode(decision)
}
fmt.Printf("Route: %s\n", decision.Route)
fmt.Printf("Reason: %s\n", decision.Reason)
fmt.Printf("Command: devour %s\n", strings.Join(decision.Command, " "))
if autoDryRun {
return nil
}
exe, err := os.Executable()
if err != nil {
return err
}
run := exec.Command(exe, decision.Command...)
run.Stdout = cmd.OutOrStdout()
run.Stderr = cmd.ErrOrStderr()
return run.Run()
}
func classifyIntent(intent, langOverride string) (*autoDecision, error) {
lower := strings.ToLower(intent)
trimmed := strings.TrimSpace(intent)
if u, err := url.Parse(trimmed); err == nil && (u.Scheme == "http" || u.Scheme == "https") {
route := []string{"scrape", trimmed}
return &autoDecision{Intent: intent, Route: "scrape", Reason: "detected URL input", Command: route}, nil
}
if strings.Contains(lower, "quality") || strings.Contains(lower, "technical debt") || strings.Contains(lower, "lint") || strings.Contains(lower, "code smell") {
route := []string{"quality", "status"}
if strings.Contains(lower, "scan") {
route = []string{"quality", "scan", "."}
}
return &autoDecision{Intent: intent, Route: "quality", Reason: "detected quality-analysis intent", Command: route}, nil
}
language := strings.TrimSpace(langOverride)
if language == "" {
language = inferLanguageFromText(lower)
}
if language != "" {
if canonical, ok := normalizeLanguage(language); ok {
language = canonical
} else {
language = ""
}
}
if strings.Contains(lower, "?") || strings.Contains(lower, "how") || strings.Contains(lower, "why") || strings.Contains(lower, "what") {
if language == "" {
language = "go"
}
route := []string{"ask", "--lang", language, intent, "--format", "text"}
return &autoDecision{Intent: intent, Route: "ask", Reason: "question-style intent", Command: route}, nil
}
if language == "" {
language = "go"
}
keyword := inferKeyword(intent)
if canonical, ok := normalizeLanguage(keyword); ok && canonical == language {
keyword = "overview"
}
route := []string{"get", language, keyword}
return &autoDecision{Intent: intent, Route: "get", Reason: "default docs retrieval route", Command: route}, nil
}
func inferLanguageFromText(text string) string {
text = strings.ToLower(text)
if strings.Contains(text, "c#") {
return "csharp"
}
if strings.Contains(text, "next.js") {
return "nextjs"
}
tokens := strings.FieldsFunc(text, func(r rune) bool {
return !(unicode.IsLetter(r) || unicode.IsDigit(r))
})
tokenSet := make(map[string]bool, len(tokens))
for _, tok := range tokens {
if tok != "" {
tokenSet[tok] = true
}
}
aliases := make([]string, 0, len(languageAliases()))
for alias := range languageAliases() {
aliases = append(aliases, alias)
}
sort.Slice(aliases, func(i, j int) bool {
return len(aliases[i]) > len(aliases[j])
})
for _, alias := range aliases {
if tokenSet[alias] {
return alias
}
}
return ""
}
func inferKeyword(intent string) string {
words := strings.Fields(strings.ToLower(intent))
stop := map[string]bool{
"get": true, "docs": true, "documentation": true, "about": true, "for": true, "on": true,
"the": true, "a": true, "an": true, "show": true, "me": true, "please": true,
}
for _, w := range words {
w = strings.Trim(w, ",.!?;:")
if w == "" || stop[w] || len(w) < 2 {
continue
}
return w
}
return "overview"
}
+31
View File
@@ -0,0 +1,31 @@
package cmd
import "testing"
func TestInferLanguageFromText_UsesTokenBoundaries(t *testing.T) {
if got := inferLanguageFromText("get nextjs docs"); got != "nextjs" {
t.Fatalf("inferLanguageFromText matched %q, want %q", got, "nextjs")
}
if got := inferLanguageFromText("read docs for architecture"); got != "" {
t.Fatalf("inferLanguageFromText should not infer language from plain docs text, got %q", got)
}
}
func TestClassifyIntent_GetRouteKeywordFallback(t *testing.T) {
decision, err := classifyIntent("get nextjs docs", "")
if err != nil {
t.Fatalf("classifyIntent returned error: %v", err)
}
if decision.Route != "get" {
t.Fatalf("expected get route, got %q", decision.Route)
}
if len(decision.Command) != 3 {
t.Fatalf("expected 3 command args, got %v", decision.Command)
}
if decision.Command[1] != "nextjs" {
t.Fatalf("expected language nextjs, got %q", decision.Command[1])
}
if decision.Command[2] != "overview" {
t.Fatalf("expected keyword overview, got %q", decision.Command[2])
}
}
+124 -36
View File
@@ -14,6 +14,7 @@ import argparse
class ModernBannerGenerator: class ModernBannerGenerator:
def __init__(self, data): def __init__(self, data):
self.data = data self.data = data
self.fonts = self._init_fonts()
# Devour brand colors - consistent with Go theme # Devour brand colors - consistent with Go theme
self.colors = { self.colors = {
@@ -57,6 +58,49 @@ class ModernBannerGenerator:
'severity_t4': (248, 113, 113), # #f87171 - bright red 'severity_t4': (248, 113, 113), # #f87171 - bright red
} }
def _init_fonts(self):
"""Initialize font candidates and cache."""
# Prefer widely-available fonts on Linux/macOS/Windows.
font_candidates = {
"regular": [
"arial.ttf",
"/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf",
"/usr/share/fonts/truetype/liberation/LiberationSans-Regular.ttf",
"/System/Library/Fonts/Supplemental/Arial.ttf",
"/Library/Fonts/Arial.ttf",
],
"bold": [
"arialbd.ttf",
"/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf",
"/usr/share/fonts/truetype/liberation/LiberationSans-Bold.ttf",
"/System/Library/Fonts/Supplemental/Arial Bold.ttf",
"/Library/Fonts/Arial Bold.ttf",
],
}
return {
"candidates": font_candidates,
"cache": {},
}
def get_font(self, size, weight="regular"):
"""Get a cached font or fall back to the default."""
key = (size, weight)
if key in self.fonts["cache"]:
return self.fonts["cache"][key]
for path in self.fonts["candidates"].get(weight, []):
try:
font = ImageFont.truetype(path, size)
self.fonts["cache"][key] = font
return font
except:
continue
font = ImageFont.load_default()
self.fonts["cache"][key] = font
return font
def get_score_color(self, score, muted=False): def get_score_color(self, score, muted=False):
if score >= 90: if score >= 90:
return self.colors['score_a_muted'] if muted else self.colors['score_a'] return self.colors['score_a_muted'] if muted else self.colors['score_a']
@@ -90,6 +134,22 @@ class ModernBannerGenerator:
for x in range(width): for x in range(width):
img.putpixel((x, y), (r, g, b)) img.putpixel((x, y), (r, g, b))
# Add subtle radial glows for depth
self.draw_glow(img, width * 0.15, height * 0.2, 220, (71, 85, 105), 40)
self.draw_glow(img, width * 0.85, height * 0.75, 260, (251, 146, 60), 35)
def draw_glow(self, img, cx, cy, radius, color, max_alpha):
"""Draw a soft radial glow."""
draw = ImageDraw.Draw(img)
steps = 12
for i in range(steps):
r = radius - (radius * i / steps)
alpha = int(max_alpha * (1 - i / steps))
draw.ellipse(
[(cx - r, cy - r), (cx + r, cy + r)],
fill=(*color, alpha),
)
def draw_glass_card(self, draw, x, y, width, height, border_radius=12, use_alt=False): def draw_glass_card(self, draw, x, y, width, height, border_radius=12, use_alt=False):
"""Draw glass morphism card with enhanced effects""" """Draw glass morphism card with enhanced effects"""
card_color = self.colors['card_alt'] if use_alt else self.colors['card'] card_color = self.colors['card_alt'] if use_alt else self.colors['card']
@@ -142,19 +202,20 @@ class ModernBannerGenerator:
# Draw progress arc # Draw progress arc
start_angle = -90 start_angle = -90
end_angle = start_angle + (360 * percentage) end_angle = start_angle + (360 * percentage)
arc_width = 8 if is_primary else 6 arc_width = 9 if is_primary else 6
draw.arc([(cx-radius+4, cy-radius+4), (cx+radius-4, cy+radius-4)], draw.arc([(cx-radius+4, cy-radius+4), (cx+radius-4, cy+radius-4)],
start_angle, end_angle, start_angle, end_angle,
fill=score_color, width=arc_width) fill=score_color, width=arc_width)
# Inner glow ring
if is_primary:
draw.arc([(cx-radius+10, cy-radius+10), (cx+radius-10, cy+radius-10)],
start_angle, end_angle, fill=score_color, width=2)
# Enhanced typography # Enhanced typography
try: font_large = self.get_font(34 if is_primary else 28, weight="bold")
font_large = ImageFont.truetype("arial.ttf", 32 if is_primary else 28) font_small = self.get_font(11, weight="regular")
font_small = ImageFont.truetype("arial.ttf", 11)
except:
font_large = ImageFont.load_default()
font_small = ImageFont.load_default()
# Score text # Score text
score_text = f"{int(score)}%" score_text = f"{int(score)}%"
@@ -189,10 +250,7 @@ class ModernBannerGenerator:
6, fill=grade_color, outline=self.colors['border']) 6, fill=grade_color, outline=self.colors['border'])
# Grade text with better typography # Grade text with better typography
try: font = self.get_font(18, weight="bold")
font = ImageFont.truetype("arial.ttf", 18)
except:
font = ImageFont.load_default()
bbox = draw.textbbox((0, 0), grade, font=font) bbox = draw.textbbox((0, 0), grade, font=font)
text_width = bbox[2] - bbox[0] text_width = bbox[2] - bbox[0]
@@ -201,15 +259,14 @@ class ModernBannerGenerator:
draw.text((x + badge_width//2 - text_width//2, y + badge_height//2 - text_height//2 + 1), draw.text((x + badge_width//2 - text_width//2, y + badge_height//2 - text_height//2 + 1),
grade, fill=(255, 255, 255), font=font) grade, fill=(255, 255, 255), font=font)
def draw_text(self, draw, text, x, y, size=14, color=None, centered=False): def draw_text(self, draw, text, x, y, size=14, color=None, centered=False, max_width=None, min_size=9, weight="regular"):
"""Draw enhanced text with better typography""" """Draw enhanced text with better typography"""
if color is None: if color is None:
color = self.colors['text'] color = self.colors['text']
try: font = self.get_font(size, weight=weight)
font = ImageFont.truetype("arial.ttf", size) if max_width is not None:
except: font = self.fit_font(draw, text, font, max_width, min_size=min_size, weight=weight)
font = ImageFont.load_default()
if centered: if centered:
bbox = draw.textbbox((0, 0), text, font=font) bbox = draw.textbbox((0, 0), text, font=font)
@@ -218,15 +275,42 @@ class ModernBannerGenerator:
draw.text((x, y), text, fill=color, font=font) draw.text((x, y), text, fill=color, font=font)
def fit_font(self, draw, text, font, max_width, min_size=9, weight="regular"):
"""Shrink font until text fits max width."""
if font == ImageFont.load_default():
return font
size = font.size if hasattr(font, "size") else min_size
current = font
while size > min_size:
bbox = draw.textbbox((0, 0), text, font=current)
if (bbox[2] - bbox[0]) <= max_width:
return current
size -= 1
current = self.get_font(size, weight=weight)
return current
def truncate_text(self, draw, text, font, max_width):
"""Truncate text with ellipsis to fit width."""
if max_width <= 0:
return ""
if draw.textbbox((0, 0), text, font=font)[2] <= max_width:
return text
ellipsis = "..."
for i in range(len(text), 0, -1):
candidate = text[:i] + ellipsis
if draw.textbbox((0, 0), candidate, font=font)[2] <= max_width:
return candidate
return ellipsis
def draw_metric_card(self, draw, x, y, width, height, title, value, color): def draw_metric_card(self, draw, x, y, width, height, title, value, color):
"""Draw metric card""" """Draw metric card"""
self.draw_glass_card(draw, x, y, width, height) self.draw_glass_card(draw, x, y, width, height)
# Title # Title
self.draw_text(draw, title, x + 15, y + 15, size=12, color=self.colors['text_muted']) self.draw_text(draw, title, x + 15, y + 14, size=12, color=self.colors['text_muted'])
# Value # Value
self.draw_text(draw, value, x + 15, y + 40, size=20, color=color) self.draw_text(draw, value, x + 15, y + 38, size=20, color=color, weight="bold")
def draw_severity_bars(self, draw, x, y, width, height, find_by_tier): def draw_severity_bars(self, draw, x, y, width, height, find_by_tier):
"""Draw enhanced severity bars""" """Draw enhanced severity bars"""
@@ -313,14 +397,14 @@ class ModernBannerGenerator:
# Enhanced header section # Enhanced header section
header_y = content_y + 20 header_y = content_y + 20
self.draw_text(draw, "DEVOUR SCORE", content_x + content_width//2, header_y, self.draw_text(draw, "DEVOUR SCORE", content_x + content_width//2, header_y,
size=20, color=self.colors['text'], centered=True) size=20, color=self.colors['text'], centered=True, weight="bold")
# Project info # Project info
project_name = self.data['project_name'] project_name = self.data['project_name']
version_text = f"v{self.data['version']}" if self.data['version'] else "latest" version_text = f"v{self.data['version']}" if self.data['version'] else "latest"
project_text = f"{project_name} {version_text}" project_text = f"{project_name} {version_text}"
self.draw_text(draw, project_text, content_x + content_width//2, header_y + 25, self.draw_text(draw, project_text, content_x + content_width//2, header_y + 25,
size=14, color=self.colors['text_muted'], centered=True) size=14, color=self.colors['text_muted'], centered=True, max_width=content_width - 120)
# Timestamp # Timestamp
time_text = self.data.get('timestamp', 'Today') time_text = self.data.get('timestamp', 'Today')
@@ -347,19 +431,19 @@ class ModernBannerGenerator:
# Total findings # Total findings
self.draw_text(draw, str(findings_total), col_x + col_width//2, metrics_y, self.draw_text(draw, str(findings_total), col_x + col_width//2, metrics_y,
size=18, color=self.colors['text'], centered=True) size=18, color=self.colors['text'], centered=True, weight="bold")
self.draw_text(draw, "TOTAL", col_x + col_width//2, metrics_y + 22, self.draw_text(draw, "TOTAL", col_x + col_width//2, metrics_y + 22,
size=10, color=self.colors['text_muted'], centered=True) size=10, color=self.colors['text_muted'], centered=True)
# Open findings # Open findings
self.draw_text(draw, str(findings_open), col_x + col_width + col_width//2, metrics_y, self.draw_text(draw, str(findings_open), col_x + col_width + col_width//2, metrics_y,
size=18, color=self.colors['orange'], centered=True) size=18, color=self.colors['orange'], centered=True, weight="bold")
self.draw_text(draw, "OPEN", col_x + col_width + col_width//2, metrics_y + 22, self.draw_text(draw, "OPEN", col_x + col_width + col_width//2, metrics_y + 22,
size=10, color=self.colors['text_muted'], centered=True) size=10, color=self.colors['text_muted'], centered=True)
# Resolved findings # Resolved findings
self.draw_text(draw, str(findings_closed), col_x + 2*col_width + col_width//2, metrics_y, self.draw_text(draw, str(findings_closed), col_x + 2*col_width + col_width//2, metrics_y,
size=18, color=self.colors['score_a'], centered=True) size=18, color=self.colors['score_a'], centered=True, weight="bold")
self.draw_text(draw, "RESOLVED", col_x + 2*col_width + col_width//2, metrics_y + 22, self.draw_text(draw, "RESOLVED", col_x + 2*col_width + col_width//2, metrics_y + 22,
size=10, color=self.colors['text_muted'], centered=True) size=10, color=self.colors['text_muted'], centered=True)
@@ -379,7 +463,7 @@ class ModernBannerGenerator:
# Header section # Header section
header_y = 30 header_y = 30
self.draw_text(draw, f"{self.data['project_name']} Quality Report", self.draw_text(draw, f"{self.data['project_name']} Quality Report",
width//2, header_y, size=28, color=self.colors['text'], centered=True) width//2, header_y, size=28, color=self.colors['text'], centered=True, weight="bold", max_width=width - 80)
version_text = f"v{self.data['version']}" if self.data['version'] else "latest" version_text = f"v{self.data['version']}" if self.data['version'] else "latest"
self.draw_text(draw, version_text, width//2, header_y + 35, self.draw_text(draw, version_text, width//2, header_y + 35,
@@ -400,7 +484,7 @@ class ModernBannerGenerator:
score_details_y = score_y + 100 score_details_y = score_y + 100
self.draw_text(draw, f"Overall: {int(self.data['overall_score'])}%", self.draw_text(draw, f"Overall: {int(self.data['overall_score'])}%",
score_x, score_details_y, size=20, score_x, score_details_y, size=20,
color=self.get_score_color(self.data['overall_score']), centered=True) color=self.get_score_color(self.data['overall_score']), centered=True, weight="bold")
self.draw_text(draw, f"Strict: {int(self.data['strict_score'])}%", self.draw_text(draw, f"Strict: {int(self.data['strict_score'])}%",
score_x, score_details_y + 25, size=16, score_x, score_details_y + 25, size=16,
color=self.get_score_color(self.data['strict_score'], muted=True), centered=True) color=self.get_score_color(self.data['strict_score'], muted=True), centered=True)
@@ -419,7 +503,7 @@ class ModernBannerGenerator:
# Column 1 Header # Column 1 Header
self.draw_text(draw, "Score Breakdown", col1_x + col_width//2, grid_start_y + 20, self.draw_text(draw, "Score Breakdown", col1_x + col_width//2, grid_start_y + 20,
size=18, color=self.colors['text'], centered=True) size=18, color=self.colors['text'], centered=True, weight="bold")
# Column 1 Data # Column 1 Data
score_data = [ score_data = [
@@ -439,7 +523,7 @@ class ModernBannerGenerator:
# Value # Value
self.draw_text(draw, value, col1_x + col_width//2, data_y + 35, self.draw_text(draw, value, col1_x + col_width//2, data_y + 35,
size=24, color=color, centered=True) size=24, color=color, centered=True, weight="bold")
data_y += 80 data_y += 80
@@ -449,15 +533,19 @@ class ModernBannerGenerator:
# Column 2 Header # Column 2 Header
self.draw_text(draw, "Findings by Type", col2_x + col_width//2, grid_start_y + 20, self.draw_text(draw, "Findings by Type", col2_x + col_width//2, grid_start_y + 20,
size=18, color=self.colors['text'], centered=True) size=18, color=self.colors['text'], centered=True, weight="bold")
# Column 2 Data - Top finding types # Column 2 Data - Top finding types
type_data_y = grid_start_y + 60 type_data_y = grid_start_y + 60
type_items = list(self.data['find_by_type'].items())[:6] # Top 6 types type_items = list(self.data['find_by_type'].items())[:6] # Top 6 types
max_type_count = max(self.data['find_by_type'].values()) if self.data['find_by_type'] else 1
if not type_items:
self.draw_text(draw, "No findings", col2_x + col_width//2, grid_start_y + 110,
size=14, color=self.colors['text_dim'], centered=True)
for issue_type, count in type_items: for issue_type, count in type_items:
# Type bar # Type bar
bar_width = int((col_width - 40) * (count / max(self.data['find_by_type'].values()))) bar_width = int((col_width - 40) * (count / max_type_count))
bar_height = 22 bar_height = 22
# Bar background # Bar background
@@ -469,9 +557,9 @@ class ModernBannerGenerator:
4, fill=self.colors['orange']) 4, fill=self.colors['orange'])
# Type label # Type label
label_text = f"{issue_type}" label_text = f"{issue_type}".replace("_", " ")
if len(label_text) > 20: font_label = self.get_font(11, weight="regular")
label_text = label_text[:17] + "..." label_text = self.truncate_text(draw, label_text, font_label, col_width - 90)
self.draw_text(draw, label_text, col2_x + 25, type_data_y + 2, self.draw_text(draw, label_text, col2_x + 25, type_data_y + 2,
size=11, color=self.colors['text_muted']) size=11, color=self.colors['text_muted'])
@@ -487,7 +575,7 @@ class ModernBannerGenerator:
# Column 3 Header # Column 3 Header
self.draw_text(draw, "Issues by Severity", col3_x + col_width//2, grid_start_y + 20, self.draw_text(draw, "Issues by Severity", col3_x + col_width//2, grid_start_y + 20,
size=18, color=self.colors['text'], centered=True) size=18, color=self.colors['text'], centered=True, weight="bold")
# Column 3 Data - Severity breakdown # Column 3 Data - Severity breakdown
severity_data_y = grid_start_y + 60 severity_data_y = grid_start_y + 60
@@ -510,11 +598,11 @@ class ModernBannerGenerator:
# Severity name # Severity name
self.draw_text(draw, severity_name, col3_x + 50, severity_data_y + 15, self.draw_text(draw, severity_name, col3_x + 50, severity_data_y + 15,
size=14, color=self.colors['text']) size=14, color=self.colors['text'], max_width=col_width - 70)
# Count # Count
self.draw_text(draw, f"{count} issues", col3_x + 50, severity_data_y + 35, self.draw_text(draw, f"{count} issues", col3_x + 50, severity_data_y + 35,
size=16, color=color) size=16, color=color, weight="bold")
severity_data_y += 70 severity_data_y += 70
@@ -539,7 +627,7 @@ class ModernBannerGenerator:
# Value # Value
self.draw_text(draw, value, metric_x + metrics_width//2, summary_y + 10, self.draw_text(draw, value, metric_x + metrics_width//2, summary_y + 10,
size=18, color=color, centered=True) size=18, color=color, centered=True, weight="bold")
# Label # Label
self.draw_text(draw, label, metric_x + metrics_width//2, summary_y + 30, self.draw_text(draw, label, metric_x + metrics_width//2, summary_y + 30,
-1
View File
@@ -24,7 +24,6 @@ This command will:
} }
func init() { func init() {
rootCmd.AddCommand(demoCmd)
} }
func runDemo(cmd *cobra.Command, args []string) error { func runDemo(cmd *cobra.Command, args []string) error {
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
+5
View File
@@ -0,0 +1,5 @@
{
"version": "1",
"built_at": "2026-02-23T11:19:21.65415175+01:00",
"docs": []
}
@@ -0,0 +1,7 @@
{
"version": "1",
"built_at": "2026-02-23T11:19:21.65415175+01:00",
"docs_dir": "./devour_data/docs",
"source_file_hash": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"doc_count": 0
}
+2 -1
View File
@@ -1,3 +1,5 @@
//go:build ignore
package main package main
import ( import (
@@ -5,7 +7,6 @@ import (
"time" "time"
"github.com/yourorg/devour/internal/quality" "github.com/yourorg/devour/internal/quality"
"github.com/yourorg/devour/internal/quality/scorecard"
) )
func main() { func main() {
+226 -58
View File
@@ -2,6 +2,7 @@ package cmd
import ( import (
"fmt" "fmt"
"sort"
"strings" "strings"
"github.com/spf13/cobra" "github.com/spf13/cobra"
@@ -11,112 +12,210 @@ var getCmd = &cobra.Command{
Use: "get <language> <keyword>", Use: "get <language> <keyword>",
Short: "Get documentation for a language/framework", Short: "Get documentation for a language/framework",
Long: `Quickly fetch documentation for popular languages and frameworks. Long: `Quickly fetch documentation for popular languages and frameworks.
This command automatically maps language+keyword combinations to their official documentation sites. This command maps language+keyword combinations to official documentation sources.
Supported languages:
go, golang - Go documentation (pkg.go.dev)
rust - Rust documentation (docs.rs)
python, py - Python documentation (docs.python.org)
java - Java documentation (docs.oracle.com)
spring - Spring Boot documentation (docs.spring.io)
typescript, ts - TypeScript documentation (typescriptlang.org)
react - React documentation (react.dev)
vue - Vue.js documentation (vuejs.org)
nuxt - Nuxt documentation (nuxt.com)
docker - Docker documentation (docs.docker.com)
cloudflare, cf - Cloudflare documentation (developers.cloudflare.com)
astro - Astro documentation (docs.astro.build)
Examples: Examples:
devour get go http # Go HTTP package documentation devour get go http
devour get python asyncio # Python asyncio module devour get python asyncio
devour get react hooks # React Hooks documentation devour get react hooks
devour get docker compose # Docker Compose docs devour get nextjs routing
devour get rust tokio # Rust Tokio crate`, devour get express middleware`,
Args: cobra.ExactArgs(2), Args: cobra.ExactArgs(2),
RunE: runGet, RunE: runGet,
} }
func init() { func init() {
// Add flags that can override defaults
getCmd.Flags().StringVarP(&scrapeFormat, "format", "f", "json", "output format (json, markdown)") getCmd.Flags().StringVarP(&scrapeFormat, "format", "f", "json", "output format (json, markdown)")
getCmd.Flags().StringVarP(&scrapeOutput, "output", "o", "", "output directory (default: devour_data/docs)") getCmd.Flags().StringVarP(&scrapeOutput, "output", "o", "", "output directory (default: configured docs dir)")
getCmd.Flags().IntVar(&scrapeConcurrency, "concurrency", 10, "parallel scraping workers") getCmd.Flags().IntVar(&scrapeConcurrency, "concurrency", 10, "parallel scraping workers")
} }
func runGet(cmd *cobra.Command, args []string) error { func runGet(cmd *cobra.Command, args []string) error {
language := strings.ToLower(args[0]) langIn := strings.ToLower(strings.TrimSpace(args[0]))
keyword := strings.ToLower(args[1]) keyword := strings.TrimSpace(args[1])
if keyword == "" {
return fmt.Errorf("keyword is required")
}
language, ok := normalizeLanguage(langIn)
if !ok {
return fmt.Errorf("unsupported language: %s. Supported: %s", langIn, strings.Join(supportedLanguages(), ", "))
}
// Map language to base URL and construct full URL
url, err := constructDocURL(language, keyword) url, err := constructDocURL(language, keyword)
if err != nil { if err != nil {
return err return err
} }
// Set the scrape type based on language
sourceType := mapLanguageToType(language) sourceType := mapLanguageToType(language)
scrapeType = sourceType
// Reuse the existing scrape logic with pre-determined values
scrapeType = string(sourceType)
sourceURL := url
fmt.Printf("Getting docs for: %s %s\n", language, keyword) fmt.Printf("Getting docs for: %s %s\n", language, keyword)
fmt.Printf("URL: %s\n", sourceURL) fmt.Printf("URL: %s\n", url)
fmt.Printf("Type: %s\n", sourceType) fmt.Printf("Type: %s\n\n", sourceType)
fmt.Println()
// Call the existing scrape logic return runScrape(cmd, []string{url})
return runScrape(cmd, []string{sourceURL})
} }
func constructDocURL(language, keyword string) (string, error) { func constructDocURL(language, keyword string) (string, error) {
language = strings.ToLower(strings.TrimSpace(language))
keyword = strings.TrimSpace(keyword)
lowerKeyword := strings.ToLower(keyword)
switch language { switch language {
case "go", "golang": case "go":
return fmt.Sprintf("https://pkg.go.dev/%s", keyword), nil return fmt.Sprintf("https://pkg.go.dev/%s", lowerKeyword), nil
case "rust": case "rust":
return fmt.Sprintf("https://docs.rs/%s/latest/%s/", keyword, keyword), nil return fmt.Sprintf("https://docs.rs/%s/latest/%s/", lowerKeyword, lowerKeyword), nil
case "python", "py": case "python":
if keyword == "stdlib" || keyword == "standard" { if lowerKeyword == "stdlib" || lowerKeyword == "standard" {
return "https://docs.python.org/3/library/", nil return "https://docs.python.org/3/library/", nil
} }
return fmt.Sprintf("https://docs.python.org/3/library/%s.html", keyword), nil return fmt.Sprintf("https://docs.python.org/3/library/%s.html", lowerKeyword), nil
case "java": case "java":
return fmt.Sprintf("https://docs.oracle.com/javase/8/docs/api/%s.html", keyword), nil return fmt.Sprintf("https://docs.oracle.com/javase/8/docs/api/%s.html", lowerKeyword), nil
case "spring": case "spring":
return fmt.Sprintf("https://docs.spring.io/spring-boot/docs/current/reference/htmlsingle/#%s", keyword), nil if lowerKeyword == "mcp" || lowerKeyword == "mcp-overview" {
case "typescript", "ts": return "https://docs.spring.io/spring-ai/reference/api/mcp/mcp-overview.html", nil
return fmt.Sprintf("https://www.typescriptlang.org/docs/handbook/%s.html", keyword), nil }
return fmt.Sprintf("https://docs.spring.io/spring-boot/docs/current/reference/htmlsingle/#%s", lowerKeyword), nil
case "typescript":
return fmt.Sprintf("https://www.typescriptlang.org/docs/handbook/%s.html", lowerKeyword), nil
case "react": case "react":
return fmt.Sprintf("https://react.dev/reference/react/%s", keyword), nil if lowerKeyword == "hooks" {
return "https://react.dev/reference/react", nil
}
return fmt.Sprintf("https://react.dev/reference/react/%s", lowerKeyword), nil
case "vue": case "vue":
return fmt.Sprintf("https://vuejs.org/guide/%s.html", keyword), nil if strings.Contains(lowerKeyword, "api") {
return "https://vuejs.org/api/", nil
}
return fmt.Sprintf("https://vuejs.org/guide/%s.html", lowerKeyword), nil
case "nuxt": case "nuxt":
return fmt.Sprintf("https://nuxt.com/docs/guide/%s", keyword), nil return fmt.Sprintf("https://nuxt.com/docs/guide/%s", lowerKeyword), nil
case "docker": case "docker":
return fmt.Sprintf("https://docs.docker.com/%s", keyword), nil return fmt.Sprintf("https://docs.docker.com/%s", lowerKeyword), nil
case "cloudflare", "cf": case "cloudflare":
return fmt.Sprintf("https://developers.cloudflare.com/%s", keyword), nil return fmt.Sprintf("https://developers.cloudflare.com/%s", lowerKeyword), nil
case "astro": case "astro":
return fmt.Sprintf("https://docs.astro.build/en/guides/%s", keyword), nil path := lowerKeyword
switch lowerKeyword {
case "components":
path = "basics/astro-components"
case "api":
path = "reference/api-reference"
case "install", "setup", "getting-started":
path = "install-and-setup"
default: default:
return "", fmt.Errorf("unsupported language: %s. Supported languages: go, rust, python, java, spring, typescript, react, vue, nuxt, docker, cloudflare, astro", language) if !strings.Contains(lowerKeyword, "/") {
path = "guides/" + lowerKeyword
}
}
return fmt.Sprintf("https://docs.astro.build/en/%s/", path), nil
case "csharp":
lowerKeyword = strings.TrimPrefix(lowerKeyword, "/")
if strings.Contains(lowerKeyword, "regex") || strings.Contains(lowerKeyword, "regular-expression") {
return "https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expressions", nil
}
return fmt.Sprintf("https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/%s", lowerKeyword), nil
case "kotlin":
lowerKeyword = strings.TrimPrefix(lowerKeyword, "/")
if lowerKeyword == "regex" || lowerKeyword == "regexp" {
lowerKeyword = "strings"
}
if strings.HasSuffix(lowerKeyword, ".html") {
return fmt.Sprintf("https://kotlinlang.org/docs/%s", lowerKeyword), nil
}
return fmt.Sprintf("https://kotlinlang.org/docs/%s.html", lowerKeyword), nil
case "php":
lowerKeyword = strings.TrimPrefix(lowerKeyword, "/")
if strings.HasSuffix(lowerKeyword, ".php") || strings.Contains(lowerKeyword, "function.") || strings.Contains(lowerKeyword, "book.") {
return fmt.Sprintf("https://www.php.net/manual/en/%s", lowerKeyword), nil
}
return fmt.Sprintf("https://www.php.net/manual/en/book.%s.php", lowerKeyword), nil
case "ruby":
keyword = strings.TrimPrefix(keyword, "/")
switch strings.ToLower(keyword) {
case "regex", "regexp":
keyword = "Regexp"
case "string":
keyword = "String"
case "array":
keyword = "Array"
default:
if !strings.Contains(keyword, "::") && len(keyword) > 0 {
keyword = strings.ToUpper(keyword[:1]) + strings.ToLower(keyword[1:])
}
}
return fmt.Sprintf("https://ruby-doc.org/core/%s.html", keyword), nil
case "elixir":
keyword = strings.TrimPrefix(keyword, "/")
switch strings.ToLower(keyword) {
case "regex":
keyword = "Regex"
case "string":
keyword = "String"
case "enum":
keyword = "Enum"
default:
if len(keyword) > 0 {
keyword = strings.ToUpper(keyword[:1]) + strings.ToLower(keyword[1:])
}
}
return fmt.Sprintf("https://hexdocs.pm/elixir/%s.html", keyword), nil
case "nextjs":
if strings.Contains(lowerKeyword, "routing") {
return "https://nextjs.org/docs/app/building-your-application/routing", nil
}
if strings.Contains(lowerKeyword, "data") || strings.Contains(lowerKeyword, "fetch") {
return "https://nextjs.org/docs/app/building-your-application/data-fetching", nil
}
return "https://nextjs.org/docs", nil
case "svelte":
if strings.Contains(lowerKeyword, "kit") {
return "https://svelte.dev/docs/kit", nil
}
return "https://svelte.dev/docs/svelte/overview", nil
case "angular":
if strings.Contains(lowerKeyword, "http") {
return "https://angular.dev/guide/http", nil
}
return "https://angular.dev/guide/components", nil
case "remix":
if strings.Contains(lowerKeyword, "route") {
return "https://v2.remix.run/docs/file-conventions/routes", nil
}
return "https://v2.remix.run/docs", nil
case "solid":
// Solid docs are published from this repository and include solid-router content.
return "https://github.com/solidjs/solid-docs", nil
case "express":
if strings.Contains(lowerKeyword, "routing") {
return "https://expressjs.com/en/guide/routing.html", nil
}
if strings.Contains(lowerKeyword, "middleware") {
return "https://expressjs.com/en/guide/using-middleware.html", nil
}
return "https://expressjs.com/en/guide/writing-middleware.html", nil
default:
return "", fmt.Errorf("unsupported language: %s. Supported: %s", language, strings.Join(supportedLanguages(), ", "))
} }
} }
func mapLanguageToType(language string) string { func mapLanguageToType(language string) string {
language, _ = normalizeLanguage(language)
switch language { switch language {
case "go", "golang": case "go":
return "godocs" return "godocs"
case "rust": case "rust":
return "rustdocs" return "rustdocs"
case "python", "py": case "python":
return "pythondocs" return "pythondocs"
case "java": case "java":
return "javadocs" return "javadocs"
case "spring": case "spring":
return "springdocs" return "springdocs"
case "typescript", "ts": case "typescript":
return "tsdocs" return "tsdocs"
case "react": case "react":
return "reactdocs" return "reactdocs"
@@ -126,11 +225,80 @@ func mapLanguageToType(language string) string {
return "nuxtdocs" return "nuxtdocs"
case "docker": case "docker":
return "dockerdocs" return "dockerdocs"
case "cloudflare", "cf": case "cloudflare":
return "cloudflaredocs" return "cloudflaredocs"
case "astro": case "astro":
return "astrodocs" return "astrodocs"
case "csharp", "kotlin", "php", "ruby", "elixir", "nextjs", "svelte", "angular", "remix", "express":
return "url"
case "solid":
return "github"
default: default:
return "web" return ""
} }
} }
func normalizeLanguage(language string) (string, bool) {
language = strings.ToLower(strings.TrimSpace(language))
if language == "" {
return "", false
}
if canonical, ok := languageAliases()[language]; ok {
return canonical, true
}
return "", false
}
func languageAliases() map[string]string {
return map[string]string{
"go": "go",
"golang": "go",
"rust": "rust",
"python": "python",
"py": "python",
"java": "java",
"spring": "spring",
"typescript": "typescript",
"ts": "typescript",
"react": "react",
"vue": "vue",
"nuxt": "nuxt",
"docker": "docker",
"cloudflare": "cloudflare",
"cf": "cloudflare",
"astro": "astro",
"csharp": "csharp",
"cs": "csharp",
"kotlin": "kotlin",
"kt": "kotlin",
"php": "php",
"ruby": "ruby",
"rb": "ruby",
"elixir": "elixir",
"ex": "elixir",
"next": "nextjs",
"nextjs": "nextjs",
"svelte": "svelte",
"angular": "angular",
"ng": "angular",
"remix": "remix",
"solid": "solid",
"solidjs": "solid",
"express": "express",
"expressjs": "express",
}
}
func supportedLanguages() []string {
seen := map[string]bool{}
out := make([]string, 0)
for _, canonical := range languageAliases() {
if seen[canonical] {
continue
}
seen[canonical] = true
out = append(out, canonical)
}
sort.Strings(out)
return out
}
+121
View File
@@ -0,0 +1,121 @@
package cmd
import "testing"
func TestConstructDocURL_SupportedLanguages(t *testing.T) {
tests := []struct {
language string
keyword string
wantURL string
}{
{"go", "net/http", "https://pkg.go.dev/net/http"},
{"rust", "tokio", "https://docs.rs/tokio/latest/tokio/"},
{"python", "asyncio", "https://docs.python.org/3/library/asyncio.html"},
{"java", "java/util/list", "https://docs.oracle.com/javase/8/docs/api/java/util/list.html"},
{"spring", "mcp", "https://docs.spring.io/spring-ai/reference/api/mcp/mcp-overview.html"},
{"typescript", "utility-types", "https://www.typescriptlang.org/docs/handbook/utility-types.html"},
{"react", "hooks", "https://react.dev/reference/react"},
{"vue", "essentials/reactivity-fundamentals", "https://vuejs.org/guide/essentials/reactivity-fundamentals.html"},
{"nuxt", "directory-structure", "https://nuxt.com/docs/guide/directory-structure"},
{"docker", "compose", "https://docs.docker.com/compose"},
{"cloudflare", "workers", "https://developers.cloudflare.com/workers"},
{"astro", "components", "https://docs.astro.build/en/basics/astro-components/"},
{"csharp", "regex", "https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expressions"},
{"kotlin", "regex", "https://kotlinlang.org/docs/strings.html"},
{"php", "pcre", "https://www.php.net/manual/en/book.pcre.php"},
{"ruby", "Regexp", "https://ruby-doc.org/core/Regexp.html"},
{"elixir", "String", "https://hexdocs.pm/elixir/String.html"},
{"nextjs", "routing", "https://nextjs.org/docs/app/building-your-application/routing"},
{"svelte", "kit", "https://svelte.dev/docs/kit"},
{"angular", "http", "https://angular.dev/guide/http"},
{"remix", "routes", "https://v2.remix.run/docs/file-conventions/routes"},
{"solid", "signals", "https://github.com/solidjs/solid-docs"},
{"express", "routing", "https://expressjs.com/en/guide/routing.html"},
}
for _, tt := range tests {
t.Run(tt.language+"_"+tt.keyword, func(t *testing.T) {
got, err := constructDocURL(tt.language, tt.keyword)
if err != nil {
t.Fatalf("constructDocURL(%q, %q) returned error: %v", tt.language, tt.keyword, err)
}
if got != tt.wantURL {
t.Fatalf("constructDocURL(%q, %q) = %q, want %q", tt.language, tt.keyword, got, tt.wantURL)
}
})
}
}
func TestConstructDocURL_UnsupportedLanguage(t *testing.T) {
if _, err := constructDocURL("haskell", "regex-tdfa"); err == nil {
t.Fatal("constructDocURL should return an error for unsupported language")
}
}
func TestMapLanguageToType(t *testing.T) {
tests := []struct {
language string
wantType string
}{
{"go", "godocs"},
{"golang", "godocs"},
{"rust", "rustdocs"},
{"python", "pythondocs"},
{"py", "pythondocs"},
{"java", "javadocs"},
{"spring", "springdocs"},
{"typescript", "tsdocs"},
{"ts", "tsdocs"},
{"react", "reactdocs"},
{"vue", "vuedocs"},
{"nuxt", "nuxtdocs"},
{"docker", "dockerdocs"},
{"cloudflare", "cloudflaredocs"},
{"cf", "cloudflaredocs"},
{"astro", "astrodocs"},
{"csharp", "url"},
{"kotlin", "url"},
{"php", "url"},
{"ruby", "url"},
{"elixir", "url"},
{"nextjs", "url"},
{"next", "url"},
{"svelte", "url"},
{"angular", "url"},
{"ng", "url"},
{"remix", "url"},
{"solidjs", "github"},
{"expressjs", "url"},
{"unknown", ""},
}
for _, tt := range tests {
t.Run(tt.language, func(t *testing.T) {
got := mapLanguageToType(tt.language)
if got != tt.wantType {
t.Fatalf("mapLanguageToType(%q) = %q, want %q", tt.language, got, tt.wantType)
}
})
}
}
func TestNormalizeLanguage(t *testing.T) {
tests := []struct {
in string
want string
ok bool
}{
{"go", "go", true},
{"golang", "go", true},
{"next", "nextjs", true},
{"solidjs", "solid", true},
{"expressjs", "express", true},
{"unknown", "", false},
}
for _, tt := range tests {
got, ok := normalizeLanguage(tt.in)
if got != tt.want || ok != tt.ok {
t.Fatalf("normalizeLanguage(%q) = (%q,%v), want (%q,%v)", tt.in, got, ok, tt.want, tt.ok)
}
}
}
+5 -61
View File
@@ -6,6 +6,7 @@ import (
"path/filepath" "path/filepath"
"github.com/spf13/cobra" "github.com/spf13/cobra"
appconfig "github.com/yourorg/devour/internal/config"
) )
var initCmd = &cobra.Command{ var initCmd = &cobra.Command{
@@ -53,7 +54,10 @@ func runInit(cmd *cobra.Command, args []string) error {
} }
// Create default config // Create default config
config := generateDefaultConfig(initRemote) config, err := appconfig.RenderInitYAML(initRemote)
if err != nil {
return fmt.Errorf("failed to render default config: %w", err)
}
if err := os.WriteFile(configPath, []byte(config), 0644); err != nil { if err := os.WriteFile(configPath, []byte(config), 0644); err != nil {
return fmt.Errorf("failed to write config: %w", err) return fmt.Errorf("failed to write config: %w", err)
} }
@@ -82,63 +86,3 @@ func runInit(cmd *cobra.Command, args []string) error {
return nil return nil
} }
func generateDefaultConfig(remote bool) string {
mode := "local"
if remote {
mode = "remote"
}
return fmt.Sprintf(`# Devour Configuration
version: 1
# Storage paths
storage:
docs_dir: ./devour_data/docs
index_dir: ./devour_data/index
metadata_dir: ./devour_data/metadata
# Embedding settings
embeddings:
provider: openai
model: text-embedding-3-small
dimensions: 1536
api_key: ${OPENAI_API_KEY}
batch_size: 100
# Vector database
vector_db:
type: chromem
persist: true
similarity_metric: cosine
# Scraping settings
scraper:
user_agent: "Devour/1.0"
timeout: 30s
retry_count: 3
concurrency: 10
rate_limit: 500ms
max_depth: 3
cache_dir: ./devour_data/cache
# Scheduler
scheduler:
enabled: true
interval: 72h
check_method: hash
# Server settings
server:
mode: %s
port: 8080
host: localhost
# Sources (add your own)
sources: []
# - name: example-docs
# type: url
# url: https://docs.example.com
# include: ["**/*.md", "**/*.html"]
`, mode)
}
+67 -95
View File
@@ -1,118 +1,90 @@
package cmd package cmd
import ( import (
"encoding/json"
"fmt" "fmt"
"io"
"strings" "strings"
"github.com/spf13/cobra" "github.com/spf13/cobra"
) )
var languagesFormat string
var languagesCmd = &cobra.Command{ var languagesCmd = &cobra.Command{
Use: "languages", Use: "languages",
Short: "Show supported languages and their mappings", Short: "Show supported languages and aliases",
Long: `Display all supported languages for the 'devour get' command Long: `Display all supported languages for 'devour get' and 'devour ask'
along with their base URLs and examples. with aliases and starter examples.`,
This helps you discover what documentation sources are available
and how to reference them quickly.`,
RunE: runLanguages, RunE: runLanguages,
} }
func init() { func init() {
rootCmd.AddCommand(languagesCmd) languagesCmd.Flags().StringVar(&languagesFormat, "format", "text", "output format (text, json)")
}
type languageInfo struct {
Canonical string `json:"canonical"`
Aliases []string `json:"aliases"`
Example string `json:"example"`
Source string `json:"source"`
} }
func runLanguages(cmd *cobra.Command, args []string) error { func runLanguages(cmd *cobra.Command, args []string) error {
fmt.Println("🌐 Devour Supported Languages") rows := []languageInfo{
fmt.Println("═══════════════════════════════════════════════════════════════") {Canonical: "go", Aliases: []string{"go", "golang"}, Example: "devour get go http", Source: "pkg.go.dev"},
fmt.Println() {Canonical: "rust", Aliases: []string{"rust"}, Example: "devour get rust tokio", Source: "docs.rs"},
{Canonical: "python", Aliases: []string{"python", "py"}, Example: "devour get python asyncio", Source: "docs.python.org"},
{Canonical: "java", Aliases: []string{"java"}, Example: "devour get java string", Source: "docs.oracle.com"},
{Canonical: "spring", Aliases: []string{"spring"}, Example: "devour get spring mcp", Source: "docs.spring.io"},
{Canonical: "typescript", Aliases: []string{"typescript", "ts"}, Example: "devour get ts interfaces", Source: "typescriptlang.org"},
{Canonical: "react", Aliases: []string{"react"}, Example: "devour get react hooks", Source: "react.dev"},
{Canonical: "vue", Aliases: []string{"vue"}, Example: "devour get vue reactivity", Source: "vuejs.org"},
{Canonical: "nuxt", Aliases: []string{"nuxt"}, Example: "devour get nuxt routing", Source: "nuxt.com"},
{Canonical: "docker", Aliases: []string{"docker"}, Example: "devour get docker compose", Source: "docs.docker.com"},
{Canonical: "cloudflare", Aliases: []string{"cloudflare", "cf"}, Example: "devour get cloudflare workers", Source: "developers.cloudflare.com"},
{Canonical: "astro", Aliases: []string{"astro"}, Example: "devour get astro components", Source: "docs.astro.build"},
{Canonical: "csharp", Aliases: []string{"csharp", "cs"}, Example: "devour get csharp regex", Source: "learn.microsoft.com"},
{Canonical: "kotlin", Aliases: []string{"kotlin", "kt"}, Example: "devour get kotlin strings", Source: "kotlinlang.org"},
{Canonical: "php", Aliases: []string{"php"}, Example: "devour get php pcre", Source: "php.net"},
{Canonical: "ruby", Aliases: []string{"ruby", "rb"}, Example: "devour get ruby Regexp", Source: "ruby-doc.org"},
{Canonical: "elixir", Aliases: []string{"elixir", "ex"}, Example: "devour get elixir String", Source: "hexdocs.pm"},
{Canonical: "nextjs", Aliases: []string{"next", "nextjs"}, Example: "devour get nextjs routing", Source: "nextjs.org"},
{Canonical: "svelte", Aliases: []string{"svelte"}, Example: "devour get svelte kit", Source: "svelte.dev"},
{Canonical: "angular", Aliases: []string{"angular", "ng"}, Example: "devour get angular http", Source: "angular.dev"},
{Canonical: "remix", Aliases: []string{"remix"}, Example: "devour get remix routes", Source: "v2.remix.run"},
{Canonical: "solid", Aliases: []string{"solid", "solidjs"}, Example: "devour get solid router", Source: "github.com/solidjs/solid-docs"},
{Canonical: "express", Aliases: []string{"express", "expressjs"}, Example: "devour get express middleware", Source: "expressjs.com"},
}
languages := []struct { switch strings.ToLower(strings.TrimSpace(languagesFormat)) {
langs []string case "json":
url string out := struct {
examples []string Count int `json:"count"`
Languages []languageInfo `json:"languages"`
}{ }{
{ Count: len(rows),
langs: []string{"go", "golang"}, Languages: rows,
url: "https://pkg.go.dev/{package}",
examples: []string{"devour get go http", "devour get go fmt", "devour get golang json"},
},
{
langs: []string{"rust"},
url: "https://docs.rs/{crate}/latest/{crate}/",
examples: []string{"devour get rust tokio", "devour get rust serde", "devour get rust clap"},
},
{
langs: []string{"python", "py"},
url: "https://docs.python.org/3/library/{module}.html",
examples: []string{"devour get python asyncio", "devour get py requests", "devour get python stdlib"},
},
{
langs: []string{"java"},
url: "https://docs.oracle.com/javase/8/docs/api/{package}.html",
examples: []string{"devour get java string", "devour get java arraylist"},
},
{
langs: []string{"spring"},
url: "https://docs.spring.io/spring-boot/docs/current/reference/htmlsingle/#{section}",
examples: []string{"devour get spring boot", "devour get spring testing"},
},
{
langs: []string{"typescript", "ts"},
url: "https://www.typescriptlang.org/docs/handbook/{topic}.html",
examples: []string{"devour get typescript interfaces", "devour get ts decorators"},
},
{
langs: []string{"react"},
url: "https://react.dev/reference/react/{feature}",
examples: []string{"devour get react hooks", "devour get react components", "devour get react state"},
},
{
langs: []string{"vue"},
url: "https://vuejs.org/guide/{topic}.html",
examples: []string{"devour get vue components", "devour get vue reactivity"},
},
{
langs: []string{"nuxt"},
url: "https://nuxt.com/docs/guide/{topic}",
examples: []string{"devour get nuxt routing", "devour get nuxt middleware"},
},
{
langs: []string{"docker"},
url: "https://docs.docker.com/{topic}",
examples: []string{"devour get docker compose", "devour get docker build", "devour get docker networking"},
},
{
langs: []string{"cloudflare", "cf"},
url: "https://developers.cloudflare.com/{topic}",
examples: []string{"devour get cloudflare workers", "devour get cf pages", "devour get cloudflare dns"},
},
{
langs: []string{"astro"},
url: "https://docs.astro.build/en/guides/{topic}",
examples: []string{"devour get astro routing", "devour get astro components"},
},
} }
enc := json.NewEncoder(cmd.OutOrStdout())
for _, lang := range languages { enc.SetIndent("", " ")
fmt.Printf("🔷 %s\n", strings.Join(lang.langs, ", ")) return enc.Encode(out)
fmt.Printf(" URL: %s\n", lang.url) case "text", "":
fmt.Printf(" Examples:\n") printLanguagesText(cmd.OutOrStdout(), rows)
for _, example := range lang.examples {
fmt.Printf(" • %s\n", example)
}
fmt.Println()
}
fmt.Println("💡 Pro Tips:")
fmt.Println(" • Use 'devour get <language> help' for language-specific help")
fmt.Println(" • Add --format markdown for enhanced documentation")
fmt.Println(" • Most languages support common aliases (e.g., py → python)")
fmt.Println()
fmt.Println("🚀 Quick Start:")
fmt.Println(" devour get go http --format markdown")
fmt.Println(" devour get python asyncio")
fmt.Println(" devour get react hooks")
return nil return nil
default:
return fmt.Errorf("unsupported format: %s", languagesFormat)
}
}
func printLanguagesText(out io.Writer, rows []languageInfo) {
_, _ = fmt.Fprintln(out, "Devour Supported Languages")
_, _ = fmt.Fprintln(out, "============================================")
_, _ = fmt.Fprintln(out)
for _, row := range rows {
_, _ = fmt.Fprintf(out, "- %s (%s)\n", row.Canonical, strings.Join(row.Aliases, ", "))
_, _ = fmt.Fprintf(out, " source: %s\n", row.Source)
_, _ = fmt.Fprintf(out, " example: %s\n\n", row.Example)
}
_, _ = fmt.Fprintln(out, "Tip: use 'devour get <language> <keyword> --format markdown' for readable output.")
} }
+63
View File
@@ -0,0 +1,63 @@
package cmd
import (
"bytes"
"encoding/json"
"strings"
"testing"
)
func TestLanguagesJSONFormat(t *testing.T) {
prev := languagesFormat
defer func() { languagesFormat = prev }()
languagesFormat = "json"
var buf bytes.Buffer
languagesCmd.SetOut(&buf)
if err := runLanguages(languagesCmd, nil); err != nil {
t.Fatalf("runLanguages returned error: %v", err)
}
var payload struct {
Count int `json:"count"`
Languages []struct {
Canonical string `json:"canonical"`
Aliases []string `json:"aliases"`
} `json:"languages"`
}
if err := json.Unmarshal(buf.Bytes(), &payload); err != nil {
t.Fatalf("invalid json output: %v", err)
}
if payload.Count == 0 || len(payload.Languages) == 0 {
t.Fatalf("expected non-empty languages payload, got %+v", payload)
}
foundNext := false
for _, l := range payload.Languages {
if l.Canonical == "nextjs" {
foundNext = true
break
}
}
if !foundNext {
t.Fatalf("expected nextjs in JSON payload, got %+v", payload.Languages)
}
}
func TestLanguagesTextFormat(t *testing.T) {
prev := languagesFormat
defer func() { languagesFormat = prev }()
languagesFormat = "text"
var buf bytes.Buffer
languagesCmd.SetOut(&buf)
if err := runLanguages(languagesCmd, nil); err != nil {
t.Fatalf("runLanguages returned error: %v", err)
}
out := buf.String()
if !strings.Contains(out, "Devour Supported Languages") {
t.Fatalf("unexpected text output: %q", out)
}
}
+78 -26
View File
@@ -1,25 +1,32 @@
package cmd package cmd
import ( import (
"context"
"fmt" "fmt"
"net/url"
"os"
"strings"
"github.com/spf13/cobra" "github.com/spf13/cobra"
"github.com/yourorg/devour/internal/scraper"
"github.com/yourorg/devour/internal/search"
"github.com/yourorg/devour/internal/storage"
) )
var pushCmd = &cobra.Command{ var pushCmd = &cobra.Command{
Use: "push <path>", Use: "push <path>",
Short: "Push documents to remote MCP server", Short: "Import local documents into Devour storage/index",
Long: `Push local documents to a remote Devour MCP server. Long: `Push local documents into your Devour local workspace.
Useful for: Current stable behavior:
- Syncing local documentation to a shared server - local ingest into docs storage
- Backing up indexed content - local reindex for query/ask/status
- Contributing to a team knowledge base
Remote push is experimental and not enabled by default.
Examples: Examples:
devour push ./docs devour push ./docs
devour push ./docs --server http://devour.company.com devour push ./docs --project my-project`,
devour push ./docs --server http://localhost:8080 --project my-project`,
Args: cobra.ExactArgs(1), Args: cobra.ExactArgs(1),
RunE: runPush, RunE: runPush,
} }
@@ -30,33 +37,78 @@ var (
) )
func init() { func init() {
pushCmd.Flags().StringVar(&pushServer, "server", "", "remote Devour server URL") pushCmd.Flags().StringVar(&pushServer, "server", "", "remote Devour server URL (experimental)")
pushCmd.Flags().StringVarP(&pushProject, "project", "p", "", "project name on remote server") pushCmd.Flags().StringVarP(&pushProject, "project", "p", "", "project name label")
} }
func runPush(cmd *cobra.Command, args []string) error { func runPush(cmd *cobra.Command, args []string) error {
path := args[0] path := args[0]
if _, err := os.Stat(path); err != nil {
if pushServer == "" { return fmt.Errorf("path does not exist: %s", path)
// Try to get from config
pushServer = "http://localhost:8080"
} }
fmt.Printf("📤 Pushing to: %s\n", pushServer) cfg, err := loadAppConfig()
fmt.Printf(" Path: %s\n", path) if err != nil {
if pushProject != "" { return err
fmt.Printf(" Project: %s\n", pushProject)
} }
// TODO: Implement actual push logic server := strings.TrimSpace(pushServer)
// 1. Scan path for documents if server != "" && !isLocalServer(server) {
// 2. Connect to remote server return fmt.Errorf("remote push is experimental and not enabled in this build; use local push without --server")
// 3. Upload documents }
// 4. Wait for indexing confirmation
fmt.Println() projectName := strings.TrimSpace(pushProject)
fmt.Println("⚠️ Push functionality not yet implemented") if projectName == "" {
fmt.Println(" Remote server support coming soon") projectName = "local-push"
}
fmt.Printf("📤 Ingesting local docs from: %s\n", path)
fmt.Printf(" Project: %s\n", projectName)
fmt.Printf(" Target docs dir: %s\n", cfg.Storage.DocsDir)
s := scraper.NewScraper(scraper.SourceTypeLocal, toScraperConfig(cfg, 0))
if s == nil {
return fmt.Errorf("local scraper not available")
}
docs, err := s.Scrape(context.Background(), &scraper.Source{
Name: projectName,
Type: scraper.SourceTypeLocal,
Path: path,
Include: []string{`.*`},
})
if err != nil {
return fmt.Errorf("local ingest failed: %w", err)
}
saved, err := storage.SaveDocuments(docs, storage.SaveOptions{
Format: "json",
OutputDir: cfg.Storage.DocsDir,
AllowEmpty: false,
PrintWriter: nil,
})
if err != nil {
return fmt.Errorf("save docs failed: %w", err)
}
engine := search.NewEngine(cfg)
stats, err := engine.Rebuild(context.Background())
if err != nil {
return fmt.Errorf("reindex failed: %w", err)
}
fmt.Println("\n✓ Push complete")
fmt.Printf(" Documents imported: %d\n", saved.Count)
fmt.Printf(" Index docs: %d\n", stats.Documents)
fmt.Printf(" Index path: %s\n", stats.IndexPath)
return nil return nil
} }
func isLocalServer(raw string) bool {
u, err := url.Parse(raw)
if err != nil {
return false
}
host := strings.ToLower(u.Hostname())
return host == "" || host == "localhost" || host == "127.0.0.1"
}
+39
View File
@@ -6,6 +6,7 @@ import (
"fmt" "fmt"
"os" "os"
"path/filepath" "path/filepath"
"strings"
"time" "time"
"github.com/spf13/cobra" "github.com/spf13/cobra"
@@ -218,6 +219,7 @@ func runQualityScan(cmd *cobra.Command, args []string) error {
if err != nil { if err != nil {
return fmt.Errorf("scan failed: %w", err) return fmt.Errorf("scan failed: %w", err)
} }
result.Findings = quality.AttachDocsEvidence(lang, result.Findings)
return outputScanResult(result, qualityFormat) return outputScanResult(result, qualityFormat)
} }
@@ -256,9 +258,11 @@ func runQualityStatus(cmd *cobra.Command, args []string) error {
return json.NewEncoder(os.Stdout).Encode(scorecard) return json.NewEncoder(os.Stdout).Encode(scorecard)
case "strict": case "strict":
fmt.Println(scorer.FormatStrictScorecard(findings, lastScan)) fmt.Println(scorer.FormatStrictScorecard(findings, lastScan))
printQualityEvidenceSummary(findings)
return nil return nil
default: default:
fmt.Println(scorer.FormatScorecard(scorecard)) fmt.Println(scorer.FormatScorecard(scorecard))
printQualityEvidenceSummary(findings)
return nil return nil
} }
} }
@@ -318,6 +322,17 @@ func runQualityNext(cmd *cobra.Command, args []string) error {
fmt.Printf("Score: %d\n", next.Score) fmt.Printf("Score: %d\n", next.Score)
fmt.Printf("ID: %s\n", next.ID) fmt.Printf("ID: %s\n", next.ID)
fmt.Printf("\nDescription:\n%s\n", next.Description) fmt.Printf("\nDescription:\n%s\n", next.Description)
if next.Metadata != nil {
if urls := strings.TrimSpace(next.Metadata["docs_evidence_urls"]); urls != "" {
fmt.Printf("\nEvidence Docs:\n%s\n", urls)
}
if rationale := strings.TrimSpace(next.Metadata["docs_evidence_rationale"]); rationale != "" {
fmt.Printf("\nRationale:\n%s\n", rationale)
}
if confidence := strings.TrimSpace(next.Metadata["docs_evidence_confidence"]); confidence != "" {
fmt.Printf("Evidence confidence: %s\n", confidence)
}
}
if explain { if explain {
fmt.Printf("\nExplanation:\n") fmt.Printf("\nExplanation:\n")
@@ -693,3 +708,27 @@ func importReviewResponses(dataDir string, filename string) error {
return nil return nil
} }
func printQualityEvidenceSummary(findings []quality.Finding) {
totalWithEvidence := 0
for _, f := range findings {
if f.Metadata != nil && strings.TrimSpace(f.Metadata["docs_evidence_urls"]) != "" {
totalWithEvidence++
}
}
if totalWithEvidence == 0 {
return
}
fmt.Printf("\nEvidence-linked findings: %d/%d\n", totalWithEvidence, len(findings))
for _, f := range findings {
if f.Metadata == nil {
continue
}
urls := strings.TrimSpace(f.Metadata["docs_evidence_urls"])
if urls == "" {
continue
}
fmt.Printf(" • %s:%d - %s\n %s\n", filepath.Base(f.File), f.Line, f.Title, urls)
break
}
}
+100 -18
View File
@@ -1,9 +1,14 @@
package cmd package cmd
import ( import (
"context"
"encoding/json"
"fmt" "fmt"
"strings"
"github.com/spf13/cobra" "github.com/spf13/cobra"
appconfig "github.com/yourorg/devour/internal/config"
"github.com/yourorg/devour/internal/search"
) )
var queryCmd = &cobra.Command{ var queryCmd = &cobra.Command{
@@ -29,32 +34,109 @@ var (
) )
func init() { func init() {
queryCmd.Flags().IntVarP(&queryLimit, "limit", "l", 5, "maximum number of results") queryCmd.Flags().IntVarP(&queryLimit, "limit", "n", 5, "maximum number of results")
queryCmd.Flags().StringVarP(&queryFormat, "format", "f", "text", "output format (text, json, markdown)") queryCmd.Flags().StringVarP(&queryFormat, "format", "f", "text", "output format (text, json, markdown)")
queryCmd.Flags().Float64Var(&queryThreshold, "threshold", 0.7, "similarity threshold (0-1)") queryCmd.Flags().Float64Var(&queryThreshold, "threshold", 0, "minimum lexical score threshold")
} }
func runQuery(cmd *cobra.Command, args []string) error { func runQuery(cmd *cobra.Command, args []string) error {
query := args[0] query := strings.TrimSpace(strings.Join(args, " "))
if len(args) > 1 { if query == "" {
query = fmt.Sprintf("%s", args) return fmt.Errorf("query cannot be empty")
} }
fmt.Printf("Searching: %q\n", query) cfg, err := loadAppConfig()
fmt.Printf(" Limit: %d\n", queryLimit) if err != nil {
fmt.Printf(" Threshold: %.2f\n", queryThreshold) return err
fmt.Println() }
// TODO: Implement actual query logic engine := search.NewEngine(cfg)
// 1. Generate embedding for query results, stats, err := engine.Search(context.Background(), query, search.SearchOptions{
// 2. Search vector database Limit: queryLimit,
// 3. Format and return results Threshold: queryThreshold,
})
if err != nil {
return fmt.Errorf("query failed: %w", err)
}
// Placeholder results switch strings.ToLower(queryFormat) {
fmt.Println("Results:") case "json":
fmt.Println("━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━") resp := map[string]any{
fmt.Println("⚠️ Query functionality not yet implemented") "query": query,
fmt.Println(" Index some documents first with 'devour scrape'") "limit": queryLimit,
"threshold": queryThreshold,
"count": len(results),
"results": results,
"indexed_at": stats.LastIndexedAt,
"documents": stats.Documents,
}
enc := json.NewEncoder(cmd.OutOrStdout())
enc.SetIndent("", " ")
return enc.Encode(resp)
case "markdown":
return printQueryMarkdown(cmd, query, cfg, results, stats)
case "text":
return printQueryText(cmd, query, cfg, results, stats)
default:
return fmt.Errorf("unsupported format: %s (supported: text, json, markdown)", queryFormat)
}
}
func printQueryText(cmd *cobra.Command, query string, cfg *appconfig.Config, results []search.Result, stats *search.IndexStats) error {
fmt.Fprintf(cmd.OutOrStdout(), "Searching: %q\n", query)
fmt.Fprintf(cmd.OutOrStdout(), " Limit: %d\n", queryLimit)
fmt.Fprintf(cmd.OutOrStdout(), " Threshold: %.2f\n", queryThreshold)
fmt.Fprintf(cmd.OutOrStdout(), " Indexed docs: %d\n", stats.Documents)
fmt.Fprintf(cmd.OutOrStdout(), " Docs dir: %s\n\n", cfg.Storage.DocsDir)
if len(results) == 0 {
fmt.Fprintln(cmd.OutOrStdout(), "No results found.")
return nil
}
fmt.Fprintln(cmd.OutOrStdout(), "Results:")
fmt.Fprintln(cmd.OutOrStdout(), "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━")
for i, r := range results {
fmt.Fprintf(cmd.OutOrStdout(), "%d. %s\n", i+1, r.Title)
fmt.Fprintf(cmd.OutOrStdout(), " Score: %.3f | Type: %s | Source: %s\n", r.Score, r.Type, defaultSource(r.Source))
if r.URL != "" {
fmt.Fprintf(cmd.OutOrStdout(), " URL: %s\n", r.URL)
}
fmt.Fprintf(cmd.OutOrStdout(), " Snippet: %s\n\n", r.Snippet)
}
return nil return nil
} }
func printQueryMarkdown(cmd *cobra.Command, query string, cfg *appconfig.Config, results []search.Result, stats *search.IndexStats) error {
fmt.Fprintf(cmd.OutOrStdout(), "# Query Results\n\n")
fmt.Fprintf(cmd.OutOrStdout(), "- Query: `%s`\n", query)
fmt.Fprintf(cmd.OutOrStdout(), "- Limit: `%d`\n", queryLimit)
fmt.Fprintf(cmd.OutOrStdout(), "- Threshold: `%.2f`\n", queryThreshold)
fmt.Fprintf(cmd.OutOrStdout(), "- Indexed docs: `%d`\n", stats.Documents)
fmt.Fprintf(cmd.OutOrStdout(), "- Docs dir: `%s`\n\n", cfg.Storage.DocsDir)
if len(results) == 0 {
fmt.Fprintln(cmd.OutOrStdout(), "_No results found._")
return nil
}
for i, r := range results {
fmt.Fprintf(cmd.OutOrStdout(), "## %d. %s\n\n", i+1, r.Title)
fmt.Fprintf(cmd.OutOrStdout(), "- Score: `%.3f`\n", r.Score)
fmt.Fprintf(cmd.OutOrStdout(), "- Type: `%s`\n", r.Type)
fmt.Fprintf(cmd.OutOrStdout(), "- Source: `%s`\n", defaultSource(r.Source))
if r.URL != "" {
fmt.Fprintf(cmd.OutOrStdout(), "- URL: %s\n", r.URL)
}
fmt.Fprintf(cmd.OutOrStdout(), "\n%s\n\n", r.Snippet)
}
return nil
}
func defaultSource(source string) string {
source = strings.TrimSpace(source)
if source == "" {
return "unknown"
}
return source
}
+2
View File
@@ -6,6 +6,7 @@ import (
"time" "time"
"github.com/yourorg/devour/internal/scraper" "github.com/yourorg/devour/internal/scraper"
_ "github.com/yourorg/devour/internal/scraper/external"
) )
func main() { func main() {
@@ -90,6 +91,7 @@ func main() {
scraper.SourceTypeGitHub, scraper.SourceTypeGitHub,
scraper.SourceTypeOpenAPI, scraper.SourceTypeOpenAPI,
scraper.SourceTypeLocal, scraper.SourceTypeLocal,
scraper.SourceTypeLocalSearch,
scraper.SourceTypeGoDocs, scraper.SourceTypeGoDocs,
scraper.SourceTypeRustDocs, scraper.SourceTypeRustDocs,
scraper.SourceTypePythonDocs, scraper.SourceTypePythonDocs,
+5
View File
@@ -6,6 +6,7 @@ import (
"github.com/spf13/cobra" "github.com/spf13/cobra"
"github.com/spf13/viper" "github.com/spf13/viper"
_ "github.com/yourorg/devour/internal/scraper/external"
"github.com/yourorg/devour/internal/ui" "github.com/yourorg/devour/internal/ui"
) )
@@ -34,6 +35,7 @@ Runs in two modes:
- Local mode: OpenCode skill running entirely on your machine - Local mode: OpenCode skill running entirely on your machine
- Remote mode: MCP server for multi-user/team access`, - Remote mode: MCP server for multi-user/team access`,
Version: "1.0.0", Version: "1.0.0",
SilenceUsage: true,
} }
func Execute() { func Execute() {
@@ -53,6 +55,7 @@ func init() {
rootCmd.AddCommand(initCmd) rootCmd.AddCommand(initCmd)
rootCmd.AddCommand(scrapeCmd) rootCmd.AddCommand(scrapeCmd)
rootCmd.AddCommand(getCmd) rootCmd.AddCommand(getCmd)
rootCmd.AddCommand(askCmd)
rootCmd.AddCommand(languagesCmd) rootCmd.AddCommand(languagesCmd)
rootCmd.AddCommand(demoCmd) rootCmd.AddCommand(demoCmd)
rootCmd.AddCommand(serveCmd) rootCmd.AddCommand(serveCmd)
@@ -62,6 +65,8 @@ func init() {
rootCmd.AddCommand(pushCmd) rootCmd.AddCommand(pushCmd)
rootCmd.AddCommand(logoCmd) rootCmd.AddCommand(logoCmd)
rootCmd.AddCommand(scorecardCmd) rootCmd.AddCommand(scorecardCmd)
rootCmd.AddCommand(autoCmd)
rootCmd.AddCommand(verifyCmd)
} }
// logoCmd displays the Devour character // logoCmd displays the Devour character
+31
View File
@@ -0,0 +1,31 @@
package cmd
import "testing"
func TestRootCommandsAreUnique(t *testing.T) {
seen := map[string]bool{}
for _, c := range rootCmd.Commands() {
name := c.Name()
if seen[name] {
t.Fatalf("duplicate root command registered: %s", name)
}
seen[name] = true
}
}
func TestQueryLimitShorthandIsN(t *testing.T) {
flag := queryCmd.Flags().Lookup("limit")
if flag == nil {
t.Fatal("query --limit flag not found")
}
if flag.Shorthand != "n" {
t.Fatalf("expected query --limit shorthand to be n, got %q", flag.Shorthand)
}
}
func TestRootExecuteQueryNoPanic(t *testing.T) {
rootCmd.SetArgs([]string{"query", "http client", "--limit", "1"})
if _, err := rootCmd.ExecuteC(); err != nil {
t.Fatalf("query execution should not panic; got error: %v", err)
}
}
+81
View File
@@ -0,0 +1,81 @@
package cmd
import (
"fmt"
"path/filepath"
"strings"
"time"
appconfig "github.com/yourorg/devour/internal/config"
"github.com/yourorg/devour/internal/scraper"
)
func loadAppConfig() (*appconfig.Config, error) {
cfg, err := appconfig.Load(cfgFile)
if err != nil {
return nil, err
}
if err := cfg.EnsureStorageDirs(); err != nil {
return nil, fmt.Errorf("ensure storage dirs: %w", err)
}
return cfg, nil
}
func toScraperConfig(c *appconfig.Config, concurrencyOverride int) *scraper.Config {
sc := &scraper.Config{
UserAgent: c.Scraper.UserAgent,
Timeout: c.Scraper.Timeout,
RetryCount: c.Scraper.RetryCount,
RetryDelay: c.Scraper.RetryDelay,
Concurrency: c.Scraper.Concurrency,
RateLimit: c.Scraper.RateLimit,
MaxDepth: c.Scraper.MaxDepth,
CacheDir: c.Scraper.CacheDir,
}
if concurrencyOverride > 0 {
sc.Concurrency = concurrencyOverride
}
if sc.Timeout <= 0 {
sc.Timeout = 30 * time.Second
}
if sc.RetryCount <= 0 {
sc.RetryCount = 3
}
if sc.RetryDelay <= 0 {
sc.RetryDelay = 1 * time.Second
}
if sc.Concurrency <= 0 {
sc.Concurrency = 10
}
if sc.MaxDepth <= 0 {
sc.MaxDepth = 2
}
return sc
}
func sourceFromConfig(s appconfig.SourceConfig) *scraper.Source {
return &scraper.Source{
Name: strings.TrimSpace(s.Name),
Type: scraper.SourceType(strings.TrimSpace(s.Type)),
URL: strings.TrimSpace(s.URL),
Query: strings.TrimSpace(s.Query),
ResultLimit: s.ResultLimit,
Domains: append([]string(nil), s.Domains...),
Repo: strings.TrimSpace(s.Repo),
Branch: strings.TrimSpace(s.Branch),
Path: strings.TrimSpace(s.Path),
Include: append([]string(nil), s.Include...),
Exclude: append([]string(nil), s.Exclude...),
Schedule: strings.TrimSpace(s.Schedule),
}
}
func resolveOutputDir(c *appconfig.Config, override string) string {
if strings.TrimSpace(override) != "" {
return override
}
if strings.TrimSpace(c.Storage.DocsDir) != "" {
return c.Storage.DocsDir
}
return filepath.Join("devour_data", "docs")
}
-1
View File
@@ -37,7 +37,6 @@ Examples:
} }
func init() { func init() {
rootCmd.AddCommand(scorecardCmd)
scorecardCmd.Flags().BoolVar(&scorecardCompact, "compact", false, "Generate compact banner only") scorecardCmd.Flags().BoolVar(&scorecardCompact, "compact", false, "Generate compact banner only")
scorecardCmd.Flags().BoolVar(&scorecardDetailed, "detailed", false, "Generate detailed banner only") scorecardCmd.Flags().BoolVar(&scorecardDetailed, "detailed", false, "Generate detailed banner only")
scorecardCmd.Flags().StringVarP(&scorecardOutput, "output", "o", "lighthouse_scorecard", "Output filename prefix") scorecardCmd.Flags().StringVarP(&scorecardOutput, "output", "o", "lighthouse_scorecard", "Output filename prefix")
+291 -87
View File
@@ -2,17 +2,23 @@ package cmd
import ( import (
"context" "context"
"encoding/json" "crypto/sha256"
"encoding/hex"
"fmt" "fmt"
"net/url" "net/url"
"os" "os"
"path/filepath" "path/filepath"
"sort"
"strings" "strings"
"time" "time"
"github.com/spf13/cobra" "github.com/spf13/cobra"
"github.com/yourorg/devour/internal/markdown" appconfig "github.com/yourorg/devour/internal/config"
"github.com/yourorg/devour/internal/projectstate"
"github.com/yourorg/devour/internal/scraper" "github.com/yourorg/devour/internal/scraper"
"github.com/yourorg/devour/internal/search"
"github.com/yourorg/devour/internal/storage"
"gopkg.in/yaml.v3"
) )
var scrapeCmd = &cobra.Command{ var scrapeCmd = &cobra.Command{
@@ -34,13 +40,17 @@ Supported source types:
- dockerdocs: Docker (docs.docker.com) - dockerdocs: Docker (docs.docker.com)
- cloudflaredocs: Cloudflare (developers.cloudflare.com) - cloudflaredocs: Cloudflare (developers.cloudflare.com)
- astrodocs: Astro (docs.astro.build) - astrodocs: Astro (docs.astro.build)
- localsearch: Self-hosted search API returning JSON results
- url: Generic web pages - url: Generic web pages
- github: GitHub repositories - github: GitHub repositories
- openapi: OpenAPI/Swagger specs
- local: Local files/directories
Examples: Examples:
devour scrape https://pkg.go.dev/net/http --type godocs devour scrape https://pkg.go.dev/net/http --type godocs
devour scrape https://react.dev/reference/react --type reactdocs devour scrape https://react.dev/reference/react --type reactdocs
devour scrape https://developers.cloudflare.com/ --type cloudflaredocs devour scrape https://developers.cloudflare.com/ --type cloudflaredocs
devour scrape http://127.0.0.1:8080/search --type localsearch --search-query "golang http client"
devour scrape --sources sources.yaml`, devour scrape --sources sources.yaml`,
Args: cobra.MaximumNArgs(1), Args: cobra.MaximumNArgs(1),
RunE: runScrape, RunE: runScrape,
@@ -52,126 +62,261 @@ var (
scrapeOutput string scrapeOutput string
scrapeConcurrency int scrapeConcurrency int
scrapeType string scrapeType string
scrapeSearchQuery string
scrapeSearchLimit int
scrapeSearchDomains []string
scrapeInclude []string
scrapeExclude []string
scrapeAllowEmpty bool
) )
func init() { func init() {
scrapeCmd.Flags().StringVarP(&scrapeFormat, "format", "f", "json", "output format (json, markdown)") scrapeCmd.Flags().StringVarP(&scrapeFormat, "format", "f", "json", "output format (json, markdown)")
scrapeCmd.Flags().StringVarP(&scrapeSources, "sources", "s", "", "YAML file with source definitions") scrapeCmd.Flags().StringVarP(&scrapeSources, "sources", "s", "", "YAML file with source definitions")
scrapeCmd.Flags().StringVarP(&scrapeOutput, "output", "o", "", "output directory (default: devour_data/docs)") scrapeCmd.Flags().StringVarP(&scrapeOutput, "output", "o", "", "output directory (default: configured docs dir)")
scrapeCmd.Flags().IntVar(&scrapeConcurrency, "concurrency", 10, "parallel scraping workers") scrapeCmd.Flags().IntVar(&scrapeConcurrency, "concurrency", 10, "parallel scraping workers")
scrapeCmd.Flags().StringVarP(&scrapeType, "type", "t", "", "source type (auto-detected if not specified)") scrapeCmd.Flags().StringVarP(&scrapeType, "type", "t", "", "source type (auto-detected if not specified)")
scrapeCmd.Flags().StringVar(&scrapeSearchQuery, "search-query", "", "search query for --type localsearch")
scrapeCmd.Flags().IntVar(&scrapeSearchLimit, "search-limit", 8, "max result URLs to scrape for --type localsearch")
scrapeCmd.Flags().StringSliceVar(&scrapeSearchDomains, "search-domain", nil, "restrict localsearch results to these domains (repeatable)")
scrapeCmd.Flags().StringSliceVar(&scrapeInclude, "include", nil, "include URL/file regex patterns (repeatable)")
scrapeCmd.Flags().StringSliceVar(&scrapeExclude, "exclude", nil, "exclude URL/file regex patterns (repeatable)")
scrapeCmd.Flags().BoolVar(&scrapeAllowEmpty, "allow-empty", false, "allow success when no documents were extracted")
} }
func runScrape(cmd *cobra.Command, args []string) error { func runScrape(cmd *cobra.Command, args []string) error {
cfg, err := loadAppConfig()
if err != nil {
return err
}
if scrapeSources != "" { if scrapeSources != "" {
return scrapeFromConfig(scrapeSources) return scrapeFromConfig(cmd, cfg, scrapeSources)
} }
if len(args) == 0 { if len(args) == 0 {
return fmt.Errorf("source argument required when not using --sources flag") return fmt.Errorf("source argument required when not using --sources flag")
} }
sourceURL := args[0] sourceURL := strings.TrimSpace(args[0])
config := &scraper.Config{
UserAgent: "Devour/1.0 (Documentation Scraper)",
Timeout: 30 * time.Second,
RetryCount: 3,
RetryDelay: 1 * time.Second,
Concurrency: scrapeConcurrency,
}
sourceType := scraper.SourceType(scrapeType) sourceType := scraper.SourceType(scrapeType)
if sourceType == "" { if sourceType == "" {
sourceType = detectSourceType(sourceURL) sourceType = detectSourceType(sourceURL)
} }
fmt.Printf("Scraping: %s\n", sourceURL)
fmt.Printf(" Type: %s\n", sourceType)
fmt.Printf(" Concurrency: %d\n", scrapeConcurrency)
fmt.Println()
s := scraper.NewScraper(sourceType, config)
if s == nil {
return fmt.Errorf("unsupported source type: %s", sourceType)
}
source := &scraper.Source{ source := &scraper.Source{
Name: extractName(sourceURL), Name: extractName(sourceURL),
Type: sourceType, Type: sourceType,
URL: sourceURL, URL: sourceURL,
Query: strings.TrimSpace(scrapeSearchQuery),
ResultLimit: scrapeSearchLimit,
Domains: append([]string(nil), scrapeSearchDomains...),
Include: append([]string(nil), scrapeInclude...),
Exclude: append([]string(nil), scrapeExclude...),
}
if sourceType == scraper.SourceTypeLocal {
source.Path = sourceURL
}
applySourceProfile(source)
outputDir := resolveOutputDir(cfg, scrapeOutput)
count, err := scrapeOne(cmd, cfg, source, outputDir)
if err != nil {
return err
} }
ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second) if cfg.Indexing.Enabled {
engine := search.NewEngine(cfg)
if _, err := engine.Rebuild(context.Background()); err != nil {
return fmt.Errorf("reindex after scrape: %w", err)
}
}
fmt.Printf("\n✓ Scraping complete!\n")
fmt.Printf(" Output: %s\n", outputDir)
fmt.Printf(" Documents: %d\n", count)
fmt.Println(" Run 'devour status' to inspect local index health")
return nil
}
func scrapeFromConfig(cmd *cobra.Command, cfg *appconfig.Config, configPath string) error {
raw, err := os.ReadFile(configPath)
if err != nil {
return fmt.Errorf("read sources file: %w", err)
}
var list []appconfig.SourceConfig
if err := yaml.Unmarshal(raw, &list); err != nil || len(list) == 0 {
var wrapped struct {
Sources []appconfig.SourceConfig `yaml:"sources"`
}
if wrapErr := yaml.Unmarshal(raw, &wrapped); wrapErr != nil {
return fmt.Errorf("parse sources file: %w", err)
}
list = wrapped.Sources
}
if len(list) == 0 {
return fmt.Errorf("sources file contains no sources")
}
sort.Slice(list, func(i, j int) bool {
return list[i].Name < list[j].Name
})
outputDir := resolveOutputDir(cfg, scrapeOutput)
success := 0
failures := 0
totalDocs := 0
for _, srcCfg := range list {
source := sourceFromConfig(srcCfg)
if source.Type == "" {
if source.URL != "" {
source.Type = detectSourceType(source.URL)
} else if source.Path != "" {
source.Type = scraper.SourceTypeLocal
}
}
if source.Name == "" {
source.Name = extractName(source.URL)
if source.Name == "unknown" && source.Path != "" {
source.Name = filepath.Base(source.Path)
}
}
applySourceProfile(source)
fmt.Printf("\n=== Source: %s (%s) ===\n", source.Name, source.Type)
count, srcErr := scrapeOne(cmd, cfg, source, outputDir)
if srcErr != nil {
failures++
fmt.Printf("✗ %s failed: %v\n", source.Name, srcErr)
continue
}
totalDocs += count
success++
}
if cfg.Indexing.Enabled {
engine := search.NewEngine(cfg)
if _, err := engine.Rebuild(context.Background()); err != nil {
return fmt.Errorf("reindex after scrape sources: %w", err)
}
}
fmt.Printf("\nSummary: %d succeeded, %d failed, %d docs written\n", success, failures, totalDocs)
if failures > 0 {
return fmt.Errorf("one or more sources failed")
}
return nil
}
func scrapeOne(cmd *cobra.Command, cfg *appconfig.Config, source *scraper.Source, outputDir string) (int, error) {
if source == nil {
return 0, fmt.Errorf("source is required")
}
if source.Type == "" {
return 0, fmt.Errorf("source type is required")
}
if source.Type == scraper.SourceTypeLocalSearch && strings.TrimSpace(source.Query) == "" {
return 0, fmt.Errorf("search query is required for localsearch sources")
}
scraperConfig := toScraperConfig(cfg, scrapeConcurrency)
s := scraper.NewScraper(source.Type, scraperConfig)
if s == nil {
return 0, fmt.Errorf("unsupported source type: %s", source.Type)
}
fmt.Printf("Scraping: %s\n", chooseSourceLabel(source))
fmt.Printf(" Type: %s\n", source.Type)
fmt.Printf(" Concurrency: %d\n", scraperConfig.Concurrency)
if source.Type == scraper.SourceTypeLocalSearch {
fmt.Printf(" Search query: %s\n", source.Query)
fmt.Printf(" Search limit: %d\n", source.ResultLimit)
if len(source.Domains) > 0 {
fmt.Printf(" Search domains: %s\n", strings.Join(source.Domains, ", "))
}
}
fmt.Println()
ctx, cancel := context.WithTimeout(context.Background(), scraperConfig.Timeout*2)
defer cancel() defer cancel()
docs, err := s.Scrape(ctx, source) docs, err := s.Scrape(ctx, source)
if err != nil { if err != nil {
return fmt.Errorf("scraping failed: %w", err) return 0, fmt.Errorf("scraping failed: %w", err)
} }
fmt.Printf("✓ Scraped %d documents\n\n", len(docs)) save, err := storage.SaveDocuments(docs, storage.SaveOptions{
Format: scrapeFormat,
if scrapeOutput == "" { OutputDir: outputDir,
scrapeOutput = "devour_data/docs" AllowEmpty: scrapeAllowEmpty,
} PrintWriter: func(format string, args ...any) {
_, _ = fmt.Printf(format, args...)
if err := os.MkdirAll(scrapeOutput, 0755); err != nil { },
return fmt.Errorf("failed to create output directory: %w", err) })
}
for i, doc := range docs {
var filename string
var content []byte
if scrapeFormat == "markdown" {
filename = fmt.Sprintf("%s_%d.md", sanitizeFilename(doc.Title), i)
// Create enhanced markdown document
markdownDoc := &markdown.Document{
ID: doc.ID,
Source: doc.Source,
Type: string(doc.Type),
Title: doc.Title,
Content: doc.Content,
URL: doc.URL,
Metadata: doc.Metadata,
Hash: doc.Hash,
Timestamp: doc.Timestamp,
}
formatter := markdown.NewFormatter()
content = []byte(formatter.FormatWithTOC(markdownDoc))
} else {
filename = fmt.Sprintf("%s_%d.json", sanitizeFilename(doc.Title), i)
content, err = json.MarshalIndent(doc, "", " ")
if err != nil { if err != nil {
return fmt.Errorf("failed to marshal document: %w", err) return 0, err
}
} }
filePath := filepath.Join(scrapeOutput, filename) fmt.Printf("✓ Scraped %d documents\n", save.Count)
if err := os.WriteFile(filePath, content, 0644); err != nil {
return fmt.Errorf("failed to write document: %w", err) if err := updateSourceState(cfg, source, docs); err != nil {
return save.Count, fmt.Errorf("update source state: %w", err)
} }
fmt.Printf(" 📄 %s (%s)\n", filename, doc.Type) return save.Count, nil
}
fmt.Printf("\n✓ Scraping complete!\n")
fmt.Printf(" Output: %s\n", scrapeOutput)
fmt.Println(" Run 'devour status' to see indexed documents")
return nil
} }
func scrapeFromConfig(configPath string) error { func updateSourceState(cfg *appconfig.Config, source *scraper.Source, docs []*scraper.Document) error {
return fmt.Errorf("scraping from config file not yet implemented") state, err := projectstate.LoadSourceState(cfg.Storage.MetadataDir)
if err != nil {
return err
}
key := source.Name
if key == "" {
key = chooseSourceLabel(source)
}
h := sha256.New()
for _, d := range docs {
if d == nil {
continue
}
fmt.Fprintf(h, "%s|%s|%s\n", d.ID, d.Hash, d.URL)
}
state.Sources[key] = &projectstate.SourceState{
Name: source.Name,
Type: string(source.Type),
URL: source.URL,
Hash: hex.EncodeToString(h.Sum(nil)),
LastSync: time.Now(),
DocCount: len(docs),
}
return projectstate.SaveSourceState(cfg.Storage.MetadataDir, state)
}
func chooseSourceLabel(source *scraper.Source) string {
if strings.TrimSpace(source.URL) != "" {
return source.URL
}
if strings.TrimSpace(source.Path) != "" {
return source.Path
}
if strings.TrimSpace(source.Repo) != "" {
return source.Repo
}
return source.Name
} }
func detectSourceType(sourceURL string) scraper.SourceType { func detectSourceType(sourceURL string) scraper.SourceType {
u, err := url.Parse(sourceURL) u, err := url.Parse(sourceURL)
if err != nil { if err != nil {
if sourceURL != "" && !strings.HasPrefix(sourceURL, "http://") && !strings.HasPrefix(sourceURL, "https://") {
return scraper.SourceTypeLocal
}
return scraper.SourceTypeWeb return scraper.SourceTypeWeb
} }
@@ -208,6 +353,11 @@ func detectSourceType(sourceURL string) scraper.SourceType {
return scraper.SourceTypeAstroDocs return scraper.SourceTypeAstroDocs
case host == "github.com": case host == "github.com":
return scraper.SourceTypeGitHub return scraper.SourceTypeGitHub
case strings.HasSuffix(path, ".json") || strings.HasSuffix(path, ".yaml") || strings.HasSuffix(path, ".yml"):
if strings.Contains(strings.ToLower(path), "openapi") || strings.Contains(strings.ToLower(path), "swagger") {
return scraper.SourceTypeOpenAPI
}
return scraper.SourceTypeWeb
default: default:
return scraper.SourceTypeWeb return scraper.SourceTypeWeb
} }
@@ -216,27 +366,81 @@ func detectSourceType(sourceURL string) scraper.SourceType {
func extractName(sourceURL string) string { func extractName(sourceURL string) string {
u, err := url.Parse(sourceURL) u, err := url.Parse(sourceURL)
if err != nil { if err != nil {
if strings.TrimSpace(sourceURL) != "" {
return filepath.Base(sourceURL)
}
return "unknown" return "unknown"
} }
parts := strings.Split(strings.Trim(u.Path, "/"), "/") parts := strings.Split(strings.Trim(u.Path, "/"), "/")
if len(parts) > 0 { if len(parts) > 0 && strings.TrimSpace(parts[len(parts)-1]) != "" {
return parts[len(parts)-1] return parts[len(parts)-1]
} }
if strings.TrimSpace(u.Host) != "" {
return u.Host return u.Host
}
return "unknown"
} }
func sanitizeFilename(name string) string { func applySourceProfile(source *scraper.Source) {
name = strings.ToLower(name) if source == nil {
name = strings.ReplaceAll(name, " ", "_") return
name = strings.ReplaceAll(name, "/", "_") }
name = strings.ReplaceAll(name, ":", "_") if source.Type != scraper.SourceTypeWeb && source.Type != scraper.SourceTypeLocalSearch {
name = strings.ReplaceAll(name, ".", "_") return
}
if len(name) > 50 { if strings.TrimSpace(source.URL) == "" {
name = name[:50] return
} }
return name u, err := url.Parse(source.URL)
if err != nil {
return
}
host := strings.ToLower(u.Host)
if host == "" {
return
}
// Preserve explicit user-provided patterns.
if len(source.Include) > 0 || len(source.Exclude) > 0 {
return
}
switch {
case strings.Contains(host, "learn.microsoft.com"):
source.Include = []string{`/dotnet/`, `/csharp/`, `/base-types/`}
source.Exclude = []string{`/previous-versions/`, `/answers/`, `/support/`, `/training/`, `/events/`, `/products/`}
case strings.Contains(host, "kotlinlang.org"):
source.Include = []string{`/docs/`}
source.Exclude = []string{`/community/`, `/api/`, `/releases/`}
case strings.Contains(host, "php.net"):
source.Include = []string{`/manual/en/`}
source.Exclude = []string{`/manual/(de|fr|es|ja|ru|pt)/`, `/downloads.php`, `/bugs.php`}
case strings.Contains(host, "ruby-doc.org"):
source.Include = []string{`/core/`}
source.Exclude = []string{`/stdlib/`, `/gems/`}
case strings.Contains(host, "hexdocs.pm"):
source.Include = []string{`/elixir/`}
source.Exclude = []string{`/phoenix/`, `/ecto/`}
case strings.Contains(host, "nextjs.org"):
source.Include = []string{`/docs/`}
source.Exclude = []string{`/showcase`, `/blog`, `/learn/`, `/pricing`}
case strings.Contains(host, "svelte.dev"):
source.Include = []string{`/docs/`}
source.Exclude = []string{`/playground`, `/tutorial`, `/blog`}
case strings.Contains(host, "angular.dev"):
source.Include = []string{`/guide/`, `/api/`, `/tutorials/`}
source.Exclude = []string{`/resources/`, `/playground`}
case strings.Contains(host, "remix.run"):
source.Include = []string{`/docs/`}
source.Exclude = []string{`/blog`, `/conf`, `/merch`}
case strings.Contains(host, "solidjs.com"):
source.Include = []string{`/docs/`}
source.Exclude = []string{`/community`, `/showcase`, `/blog`}
case strings.Contains(host, "expressjs.com"):
source.Include = []string{`/en/(guide|api|advanced)/`}
source.Exclude = []string{`/en/starter/`, `/cn/`, `/fr/`, `/es/`, `/de/`}
}
} }
+56
View File
@@ -0,0 +1,56 @@
package cmd
import (
"net/http"
"net/http/httptest"
"os"
"path/filepath"
"strings"
"testing"
appconfig "github.com/yourorg/devour/internal/config"
)
func TestScrapeFromConfig(t *testing.T) {
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "text/html")
_, _ = w.Write([]byte("<html><head><title>Docs</title></head><body><main>" + strings.Repeat("docs content ", 30) + "</main></body></html>"))
}))
defer srv.Close()
tmp := t.TempDir()
cfg := appconfig.Default()
cfg.Storage.DocsDir = filepath.Join(tmp, "docs")
cfg.Storage.IndexDir = filepath.Join(tmp, "index")
cfg.Storage.MetadataDir = filepath.Join(tmp, "metadata")
cfg.Storage.CacheDir = filepath.Join(tmp, "cache")
if err := cfg.EnsureStorageDirs(); err != nil {
t.Fatal(err)
}
sourcesPath := filepath.Join(tmp, "sources.yaml")
yaml := "- name: demo\n type: url\n url: " + srv.URL + "\n"
if err := os.WriteFile(sourcesPath, []byte(yaml), 0o644); err != nil {
t.Fatal(err)
}
oldFormat, oldOutput, oldAllow := scrapeFormat, scrapeOutput, scrapeAllowEmpty
scrapeFormat = "json"
scrapeOutput = cfg.Storage.DocsDir
scrapeAllowEmpty = false
defer func() {
scrapeFormat, scrapeOutput, scrapeAllowEmpty = oldFormat, oldOutput, oldAllow
}()
if err := scrapeFromConfig(nil, cfg, sourcesPath); err != nil {
t.Fatalf("scrapeFromConfig failed: %v", err)
}
entries, err := os.ReadDir(cfg.Storage.DocsDir)
if err != nil {
t.Fatal(err)
}
if len(entries) == 0 {
t.Fatal("expected scraped files")
}
}
+39
View File
@@ -0,0 +1,39 @@
package cmd
import (
"testing"
"github.com/yourorg/devour/internal/scraper"
)
func TestDetectSourceType(t *testing.T) {
tests := []struct {
url string
wantType scraper.SourceType
}{
{"https://pkg.go.dev/net/http", scraper.SourceTypeGoDocs},
{"https://docs.rs/tokio/latest/tokio/", scraper.SourceTypeRustDocs},
{"https://docs.python.org/3/library/asyncio.html", scraper.SourceTypePythonDocs},
{"https://docs.oracle.com/javase/8/docs/api/java/util/List.html", scraper.SourceTypeJavaDocs},
{"https://docs.spring.io/spring-boot/docs/current/reference/htmlsingle/", scraper.SourceTypeSpringDocs},
{"https://www.typescriptlang.org/docs/handbook/2/basic-types.html", scraper.SourceTypeTSDocs},
{"https://react.dev/reference/react", scraper.SourceTypeReactDocs},
{"https://vuejs.org/guide/introduction.html", scraper.SourceTypeVueDocs},
{"https://nuxt.com/docs/guide/directory-structure", scraper.SourceTypeNuxtDocs},
{"https://docs.docker.com/compose", scraper.SourceTypeDockerDocs},
{"https://hub.docker.com/mcp/server/github", scraper.SourceTypeMCPDocs},
{"https://developers.cloudflare.com/workers", scraper.SourceTypeCloudflareDocs},
{"https://docs.astro.build/en/guides/components/", scraper.SourceTypeAstroDocs},
{"https://github.com/yourorg/devour", scraper.SourceTypeGitHub},
{"https://example.com/docs", scraper.SourceTypeWeb},
}
for _, tt := range tests {
t.Run(tt.url, func(t *testing.T) {
got := detectSourceType(tt.url)
if got != tt.wantType {
t.Fatalf("detectSourceType(%q) = %q, want %q", tt.url, got, tt.wantType)
}
})
}
}
+185 -27
View File
@@ -1,25 +1,29 @@
package cmd package cmd
import ( import (
"context"
"encoding/json"
"fmt" "fmt"
"strings"
"github.com/spf13/cobra" "github.com/spf13/cobra"
"github.com/yourorg/devour/internal/projectstate"
"github.com/yourorg/devour/internal/scraper"
"github.com/yourorg/devour/internal/search"
"github.com/yourorg/devour/internal/server"
) )
var serveCmd = &cobra.Command{ var serveCmd = &cobra.Command{
Use: "serve", Use: "serve",
Short: "Start the MCP server", Short: "Start the local Devour RPC server",
Long: `Start the Devour MCP server. Long: `Start the Devour RPC server.
In local mode (default), the server communicates via stdio, making it Local mode (default): JSON-RPC over stdin/stdout for agent/skill integration.
suitable for use as an OpenCode skill. Remote mode (--remote): experimental HTTP RPC endpoint at /rpc.
In remote mode (--remote flag), the server listens on HTTP and exposes
a REST API for multi-user access.
Examples: Examples:
devour serve # Local mode (stdio) devour serve
devour serve --remote # Remote mode on default port devour serve --remote
devour serve --remote --port 3000`, devour serve --remote --port 3000`,
RunE: runServe, RunE: runServe,
} }
@@ -31,31 +35,185 @@ var (
) )
func init() { func init() {
serveCmd.Flags().BoolVar(&serveRemote, "remote", false, "run as remote HTTP server") serveCmd.Flags().BoolVar(&serveRemote, "remote", false, "run as remote HTTP server (experimental)")
serveCmd.Flags().IntVarP(&servePort, "port", "p", 8080, "HTTP port (remote mode only)") serveCmd.Flags().IntVarP(&servePort, "port", "p", 8080, "HTTP port (remote mode only)")
serveCmd.Flags().StringVar(&serveHost, "host", "localhost", "HTTP host (remote mode only)") serveCmd.Flags().StringVar(&serveHost, "host", "localhost", "HTTP host (remote mode only)")
} }
func runServe(cmd *cobra.Command, args []string) error { func runServe(cmd *cobra.Command, args []string) error {
if serveRemote { if _, err := loadAppConfig(); err != nil {
fmt.Printf("🚀 Starting Devour server in remote mode\n") return err
fmt.Printf(" Host: %s\n", serveHost)
fmt.Printf(" Port: %d\n", servePort)
fmt.Printf(" URL: http://%s:%d\n", serveHost, servePort)
// TODO: Start HTTP MCP server
return fmt.Errorf("remote mode not yet implemented")
} }
fmt.Println("🚀 Starting Devour server in local mode (stdio)") srvCfg := &server.Config{
fmt.Println(" Communicating via JSON-RPC over stdin/stdout") Mode: "local",
Transport: "stdio",
Host: serveHost,
Port: servePort,
Handler: func(ctx context.Context, method string, params json.RawMessage) (any, error) {
return handleServeMethod(ctx, method, params)
},
}
// TODO: Start stdio MCP server if serveRemote {
// Should handle JSON-RPC messages for: srvCfg.Mode = "remote"
// - devour_query fmt.Printf("🚀 Starting Devour RPC server in remote experimental mode\n")
// - devour_add fmt.Printf(" URL: http://%s:%d/rpc\n", serveHost, servePort)
// - devour_status } else {
// - devour_sync fmt.Println("🚀 Starting Devour RPC server in local mode (stdio)")
fmt.Println(" Protocol: JSON-RPC 2.0 over stdin/stdout")
}
return fmt.Errorf("local mode not yet implemented") srv := server.NewServer(srvCfg)
return srv.Start(context.Background())
}
func handleServeMethod(ctx context.Context, method string, params json.RawMessage) (any, error) {
// The method implementation needs full typed config. Load per-call to avoid stale state.
loadedCfg, err := loadAppConfig()
if err != nil {
return nil, err
}
switch strings.TrimSpace(method) {
case "devour_query":
var req struct {
Query string `json:"query"`
Limit int `json:"limit"`
Threshold float64 `json:"threshold"`
}
if len(params) > 0 {
_ = json.Unmarshal(params, &req)
}
engine := search.NewEngine(loadedCfg)
results, stats, err := engine.Search(ctx, req.Query, search.SearchOptions{Limit: req.Limit, Threshold: req.Threshold})
if err != nil {
return nil, err
}
return map[string]any{"query": req.Query, "count": len(results), "results": results, "indexed": stats.Documents}, nil
case "devour_status":
docsStats, err := projectstate.CollectDocsStats(loadedCfg.Storage.DocsDir)
if err != nil {
return nil, err
}
state, _ := projectstate.LoadSourceState(loadedCfg.Storage.MetadataDir)
engine := search.NewEngine(loadedCfg)
idxStats, _ := engine.EnsureIndexed(ctx)
return map[string]any{
"documents": docsStats.DocumentCount,
"storage_bytes": docsStats.StorageBytes,
"last_updated": docsStats.LastUpdated,
"sources": state.Sources,
"indexed_docs": idxStats.Documents,
"index_timestamp": idxStats.LastIndexedAt,
}, nil
case "devour_scrape":
var req struct {
Source string `json:"source"`
Type string `json:"type"`
Format string `json:"format"`
Output string `json:"output"`
Query string `json:"query"`
ResultLimit int `json:"result_limit"`
Domains []string `json:"domains"`
Include []string `json:"include"`
Exclude []string `json:"exclude"`
}
if err := json.Unmarshal(params, &req); err != nil {
return nil, err
}
if strings.TrimSpace(req.Source) == "" {
return nil, fmt.Errorf("source is required")
}
st := scraper.SourceType(req.Type)
if st == "" {
st = detectSourceType(req.Source)
}
source := &scraper.Source{
Name: extractName(req.Source),
Type: st,
URL: req.Source,
Query: strings.TrimSpace(req.Query),
ResultLimit: req.ResultLimit,
Domains: append([]string(nil), req.Domains...),
Include: append([]string(nil), req.Include...),
Exclude: append([]string(nil), req.Exclude...),
}
if st == scraper.SourceTypeLocal {
source.Path = req.Source
}
applySourceProfile(source)
prevFormat := scrapeFormat
prevOutput := scrapeOutput
prevAllowEmpty := scrapeAllowEmpty
scrapeFormat = coalesceString(req.Format, "json")
scrapeOutput = req.Output
scrapeAllowEmpty = false
count, err := scrapeOne(nil, loadedCfg, source, resolveOutputDir(loadedCfg, req.Output))
scrapeFormat = prevFormat
scrapeOutput = prevOutput
scrapeAllowEmpty = prevAllowEmpty
if err != nil {
return nil, err
}
return map[string]any{"source": req.Source, "type": st, "documents": count}, nil
case "devour_ask":
var req struct {
Question string `json:"question"`
Limit int `json:"limit"`
}
if err := json.Unmarshal(params, &req); err != nil {
return nil, err
}
if strings.TrimSpace(req.Question) == "" {
return nil, fmt.Errorf("question is required")
}
limit := req.Limit
if limit <= 0 {
limit = 5
}
engine := search.NewEngine(loadedCfg)
results, _, err := engine.Search(ctx, req.Question, search.SearchOptions{Limit: limit})
if err != nil {
return nil, err
}
summary := "No relevant docs found."
if len(results) > 0 {
summary = results[0].Snippet
}
return map[string]any{"question": req.Question, "summary": summary, "sources": results}, nil
case "devour_sync":
prevForce, prevRebuild, prevSource := syncForce, syncRebuild, syncSource
var req struct {
Source string `json:"source"`
Force bool `json:"force"`
Rebuild bool `json:"rebuild"`
}
if len(params) > 0 {
_ = json.Unmarshal(params, &req)
}
syncForce = req.Force
syncRebuild = req.Rebuild
syncSource = req.Source
err := runSync(nil, nil)
syncForce, syncRebuild, syncSource = prevForce, prevRebuild, prevSource
if err != nil {
return nil, err
}
return map[string]any{"ok": true}, nil
default:
return nil, fmt.Errorf("unknown method: %s", method)
}
}
func coalesceString(primary, fallback string) string {
if strings.TrimSpace(primary) != "" {
return primary
}
return fallback
} }
+86 -17
View File
@@ -1,10 +1,13 @@
package cmd package cmd
import ( import (
"context"
"fmt" "fmt"
"time" "time"
"github.com/spf13/cobra" "github.com/spf13/cobra"
"github.com/yourorg/devour/internal/projectstate"
"github.com/yourorg/devour/internal/search"
"github.com/yourorg/devour/internal/ui" "github.com/yourorg/devour/internal/ui"
) )
@@ -23,39 +26,105 @@ Shows:
} }
func runStatus(cmd *cobra.Command, args []string) error { func runStatus(cmd *cobra.Command, args []string) error {
cfg, err := loadAppConfig()
if err != nil {
return err
}
// Print the small character mascot // Print the small character mascot
ui.PrintCharacterSmall() ui.PrintCharacterSmall()
fmt.Println() fmt.Println()
ui.PrintHeader("Devour Status") ui.PrintHeader("Devour Status")
// TODO: Implement actual status check docsStats, err := projectstate.CollectDocsStats(cfg.Storage.DocsDir)
// Check: if err != nil {
// - Index existence and health return err
// - Document count }
// - Vector count
// - Last sync time
// - Source status
// Placeholder status engine := search.NewEngine(cfg)
ui.PrintKeyValue("Index Health", "⚠️ Not initialized") indexStats, indexErr := engine.EnsureIndexed(context.Background())
ui.PrintKeyValue("Documents", "0 indexed") indexHealth := "✓ Healthy"
ui.PrintKeyValue("Chunks", "0 total") if indexErr != nil {
ui.PrintKeyValue("Vector Dimension", "1536") if docsStats.DocumentCount == 0 {
ui.PrintKeyValue("Last Updated", "Never") indexHealth = "⚠️ No docs indexed yet"
ui.PrintKeyValue("Storage Used", "0 MB") } else {
indexHealth = "✗ Index error"
}
}
lastUpdated := "Never"
if !docsStats.LastUpdated.IsZero() {
lastUpdated = docsStats.LastUpdated.Format(time.RFC3339)
}
chunks := 0
if indexStats != nil {
chunks = indexStats.Documents
}
ui.PrintKeyValue("Index Health", indexHealth)
ui.PrintKeyValue("Documents", fmt.Sprintf("%d indexed", docsStats.DocumentCount))
ui.PrintKeyValue("Chunks", fmt.Sprintf("%d total", chunks))
ui.PrintKeyValue("Vector Dimension", fmt.Sprintf("%d", cfg.Embeddings.Dimensions))
ui.PrintKeyValue("Last Updated", lastUpdated)
ui.PrintKeyValue("Storage Used", humanSize(docsStats.StorageBytes))
fmt.Println() fmt.Println()
ui.PrintSection("Sources") ui.PrintSection("Sources")
ui.PrintInfo(" None configured") state, stateErr := projectstate.LoadSourceState(cfg.Storage.MetadataDir)
if stateErr != nil || len(state.Sources) == 0 {
ui.PrintInfo(" None tracked yet")
} else {
keys := make([]string, 0, len(state.Sources))
for k := range state.Sources {
keys = append(keys, k)
}
sortStrings(keys)
for _, k := range keys {
s := state.Sources[k]
last := "never"
if !s.LastSync.IsZero() {
last = s.LastSync.Format("2006-01-02 15:04:05")
}
fmt.Printf(" • %s (%s): %d docs, last sync %s\n", s.Name, s.Type, s.DocCount, last)
}
}
fmt.Println() fmt.Println()
ui.PrintSection("Next Steps") ui.PrintSection("Next Steps")
fmt.Println(" 1. Run 'devour init' to initialize") if docsStats.DocumentCount == 0 {
fmt.Println(" 2. Run 'devour scrape <source>' to index documents") fmt.Println(" 1. Run 'devour scrape <source>' to index documentation")
fmt.Println(" 2. Run 'devour query \"<topic>\"' to search indexed docs")
} else {
fmt.Println(" 1. Run 'devour query \"<topic>\"' for local docs search")
fmt.Println(" 2. Run 'devour ask --lang <lang> \"<question>\"' for structured answers")
}
if indexErr != nil {
fmt.Printf(" ⚠️ Index note: %v\n", indexErr)
}
// Show when check happened // Show when check happened
fmt.Printf("\nStatus as of: %s\n", time.Now().Format(time.RFC3339)) fmt.Printf("\nStatus as of: %s\n", time.Now().Format(time.RFC3339))
return nil return nil
} }
func humanSize(b int64) string {
const mb = 1024 * 1024
if b < mb {
return fmt.Sprintf("%d KB", b/1024)
}
return fmt.Sprintf("%.2f MB", float64(b)/float64(mb))
}
func sortStrings(values []string) {
if len(values) < 2 {
return
}
for i := 1; i < len(values); i++ {
for j := i; j > 0 && values[j] < values[j-1]; j-- {
values[j], values[j-1] = values[j-1], values[j]
}
}
}
+157 -17
View File
@@ -1,9 +1,18 @@
package cmd package cmd
import ( import (
"context"
"crypto/sha256"
"encoding/hex"
"fmt" "fmt"
"strings"
"time"
"github.com/spf13/cobra" "github.com/spf13/cobra"
"github.com/yourorg/devour/internal/projectstate"
"github.com/yourorg/devour/internal/scraper"
"github.com/yourorg/devour/internal/search"
"github.com/yourorg/devour/internal/storage"
) )
var syncCmd = &cobra.Command{ var syncCmd = &cobra.Command{
@@ -12,7 +21,7 @@ var syncCmd = &cobra.Command{
Long: `Fetch updates from all configured sources. Long: `Fetch updates from all configured sources.
Checks each source for changes (using hash or timestamp comparison) Checks each source for changes (using hash or timestamp comparison)
and updates the index accordingly. and updates the local docs + index accordingly.
Examples: Examples:
devour sync # Sync all sources devour sync # Sync all sources
@@ -34,29 +43,160 @@ func init() {
} }
func runSync(cmd *cobra.Command, args []string) error { func runSync(cmd *cobra.Command, args []string) error {
cfg, err := loadAppConfig()
if err != nil {
return err
}
if syncRebuild { if syncRebuild {
fmt.Println("🔄 Rebuilding index from all sources...") fmt.Println("🔄 Rebuilding local index from configured sources...")
} else { } else {
fmt.Println("🔄 Syncing with configured sources...") fmt.Println("🔄 Syncing configured sources...")
} }
if syncSource != "" { if len(cfg.Sources) == 0 {
fmt.Printf(" Source: %s\n", syncSource) fmt.Println("No sources configured. Add sources in devour.yaml first.")
return nil
} }
// TODO: Implement actual sync logic state, err := projectstate.LoadSourceState(cfg.Storage.MetadataDir)
// 1. Load sources from config if err != nil {
// 2. For each source: return err
// a. Check for changes (hash/timestamp) }
// b. If changes detected or --force:
// - Scrape updated content
// - Re-generate embeddings
// - Update index
// 3. Update metadata
fmt.Println() updated := 0
fmt.Println("⚠️ Sync functionality not yet implemented") skipped := 0
fmt.Println(" Configure sources in devour.yaml first") failed := 0
totalDocs := 0
for _, srcCfg := range cfg.Sources {
if syncSource != "" && srcCfg.Name != syncSource {
continue
}
source := sourceFromConfig(srcCfg)
if source.Type == "" {
if source.URL != "" {
source.Type = detectSourceType(source.URL)
} else if source.Path != "" {
source.Type = scraper.SourceTypeLocal
}
}
if source.Name == "" {
source.Name = extractName(source.URL)
}
applySourceProfile(source)
fmt.Printf("\n• %s (%s)\n", source.Name, source.Type)
s := scraper.NewScraper(source.Type, toScraperConfig(cfg, 0))
if s == nil {
failed++
fmt.Printf(" ✗ unsupported source type: %s\n", source.Type)
continue
}
key := source.Name
if key == "" {
key = chooseSourceLabel(source)
}
lastHash := ""
if prev := state.Sources[key]; prev != nil {
lastHash = prev.Hash
}
needsUpdate := syncForce || syncRebuild
newHash := lastHash
if !needsUpdate {
changed, hash, detectErr := s.DetectChanges(context.Background(), source, lastHash)
if detectErr != nil {
fmt.Printf(" ⚠ change detection failed (%v), scraping anyway\n", detectErr)
needsUpdate = true
} else {
needsUpdate = changed
newHash = hash
}
}
if !needsUpdate {
skipped++
fmt.Println(" ✓ no changes")
continue
}
docs, scrapeErr := s.Scrape(context.Background(), source)
if scrapeErr != nil {
failed++
fmt.Printf(" ✗ scrape failed: %v\n", scrapeErr)
state.Sources[key] = &projectstate.SourceState{
Name: source.Name,
Type: string(source.Type),
URL: source.URL,
Hash: lastHash,
LastSync: time.Now(),
DocCount: 0,
LastError: scrapeErr.Error(),
}
continue
}
saved, saveErr := storage.SaveDocuments(docs, storage.SaveOptions{
Format: "json",
OutputDir: cfg.Storage.DocsDir,
AllowEmpty: false,
PrintWriter: nil,
})
if saveErr != nil {
failed++
fmt.Printf(" ✗ save failed: %v\n", saveErr)
continue
}
if newHash == "" {
h := sha256.New()
for _, d := range docs {
if d == nil {
continue
}
fmt.Fprintf(h, "%s|%s|%s\n", d.ID, d.Hash, d.URL)
}
newHash = hex.EncodeToString(h.Sum(nil))
}
state.Sources[key] = &projectstate.SourceState{
Name: source.Name,
Type: string(source.Type),
URL: source.URL,
Hash: newHash,
LastSync: time.Now(),
DocCount: saved.Count,
LastError: "",
}
updated++
totalDocs += saved.Count
fmt.Printf(" ✓ updated (%d docs)\n", saved.Count)
}
if err := projectstate.SaveSourceState(cfg.Storage.MetadataDir, state); err != nil {
return err
}
if syncRebuild || updated > 0 {
engine := search.NewEngine(cfg)
if _, err := engine.Rebuild(context.Background()); err != nil {
return fmt.Errorf("rebuild index: %w", err)
}
}
fmt.Printf("\nSync summary: updated=%d skipped=%d failed=%d docs=%d\n", updated, skipped, failed, totalDocs)
if failed > 0 {
return fmt.Errorf("sync completed with failures")
}
if syncSource != "" && updated == 0 && skipped == 0 && failed == 0 {
return fmt.Errorf("source %q not found in config", syncSource)
}
if strings.TrimSpace(syncSource) != "" {
fmt.Printf("Synced source: %s\n", syncSource)
}
return nil return nil
} }
+169
View File
@@ -0,0 +1,169 @@
package cmd
import (
"context"
"encoding/json"
"fmt"
"os"
"path/filepath"
"time"
"github.com/spf13/cobra"
"github.com/yourorg/devour/internal/scraper"
)
var (
verifyFormat string
verifyTimeout int
)
var verifyCmd = &cobra.Command{
Use: "verify",
Short: "Run Devour verification suites",
Long: `Run deterministic and live verification suites for Devour commands and scrapers.`,
}
var verifySmokeCmd = &cobra.Command{
Use: "smoke",
Short: "Run live docs scraping smoke checks",
Long: `Run an opt-in live network smoke suite and persist a machine-readable report under devour_data/verify/.`,
RunE: runVerifySmoke,
}
func init() {
verifyCmd.AddCommand(verifySmokeCmd)
verifySmokeCmd.Flags().StringVar(&verifyFormat, "format", "text", "output format (text, json)")
verifySmokeCmd.Flags().IntVar(&verifyTimeout, "timeout", 90, "timeout per smoke case in seconds")
}
type verifyCase struct {
Name string `json:"name"`
Type scraper.SourceType `json:"type"`
URL string `json:"url"`
Passed bool `json:"passed"`
Docs int `json:"docs"`
Error string `json:"error,omitempty"`
TookMs int64 `json:"took_ms"`
}
type verifyReport struct {
CreatedAt time.Time `json:"created_at"`
Duration string `json:"duration"`
Passed int `json:"passed"`
Failed int `json:"failed"`
Cases []verifyCase `json:"cases"`
}
func runVerifySmoke(cmd *cobra.Command, args []string) error {
cfg, err := loadAppConfig()
if err != nil {
return err
}
if verifyTimeout <= 0 {
verifyTimeout = 90
}
cases := []verifyCase{
{Name: "Go net/http", Type: scraper.SourceTypeGoDocs, URL: "https://pkg.go.dev/net/http"},
{Name: "Python asyncio", Type: scraper.SourceTypePythonDocs, URL: "https://docs.python.org/3/library/asyncio.html"},
{Name: "React reference", Type: scraper.SourceTypeReactDocs, URL: "https://react.dev/reference/react"},
{Name: "TypeScript handbook", Type: scraper.SourceTypeTSDocs, URL: "https://www.typescriptlang.org/docs/handbook/2/basic-types.html"},
{Name: "Next.js docs", Type: scraper.SourceTypeWeb, URL: "https://nextjs.org/docs"},
{Name: "Svelte docs", Type: scraper.SourceTypeWeb, URL: "https://svelte.dev/docs/kit"},
{Name: "Angular guide", Type: scraper.SourceTypeWeb, URL: "https://angular.dev/guide/http"},
{Name: "Remix docs", Type: scraper.SourceTypeWeb, URL: "https://v2.remix.run/docs"},
{Name: "Solid docs repo", Type: scraper.SourceTypeGitHub, URL: "https://github.com/solidjs/solid-docs"},
{Name: "Express guide", Type: scraper.SourceTypeWeb, URL: "https://expressjs.com/en/guide/routing.html"},
}
startAll := time.Now()
passed := 0
failed := 0
for i := range cases {
c := &cases[i]
caseStart := time.Now()
s := scraper.NewScraper(c.Type, toScraperConfig(cfg, 4))
if s == nil {
c.Error = "scraper not available"
c.Passed = false
failed++
continue
}
ctx, cancel := context.WithTimeout(context.Background(), time.Duration(verifyTimeout)*time.Second)
docs, err := s.Scrape(ctx, &scraper.Source{Name: c.Name, Type: c.Type, URL: c.URL})
cancel()
c.TookMs = time.Since(caseStart).Milliseconds()
if err != nil {
c.Error = err.Error()
c.Passed = false
failed++
continue
}
c.Docs = len(docs)
if len(docs) == 0 {
c.Error = "0 documents"
c.Passed = false
failed++
continue
}
c.Passed = true
passed++
}
report := verifyReport{
CreatedAt: time.Now(),
Duration: time.Since(startAll).String(),
Passed: passed,
Failed: failed,
Cases: cases,
}
rootDataDir := filepath.Dir(cfg.Storage.DocsDir)
verifyDir := filepath.Join(rootDataDir, "verify")
if err := os.MkdirAll(verifyDir, 0o755); err != nil {
return err
}
filename := fmt.Sprintf("smoke-%s.json", time.Now().Format("20060102-150405"))
reportPath := filepath.Join(verifyDir, filename)
b, err := json.MarshalIndent(report, "", " ")
if err != nil {
return err
}
if err := os.WriteFile(reportPath, b, 0o644); err != nil {
return err
}
switch verifyFormat {
case "json":
enc := json.NewEncoder(cmd.OutOrStdout())
enc.SetIndent("", " ")
if err := enc.Encode(report); err != nil {
return err
}
default:
fmt.Fprintf(cmd.OutOrStdout(), "Smoke verification complete\n")
fmt.Fprintf(cmd.OutOrStdout(), " Passed: %d\n", report.Passed)
fmt.Fprintf(cmd.OutOrStdout(), " Failed: %d\n", report.Failed)
fmt.Fprintf(cmd.OutOrStdout(), " Report: %s\n", reportPath)
for _, c := range report.Cases {
status := "PASS"
if !c.Passed {
status = "FAIL"
}
fmt.Fprintf(cmd.OutOrStdout(), " - [%s] %s (%d docs, %dms)", status, c.Name, c.Docs, c.TookMs)
if c.Error != "" {
fmt.Fprintf(cmd.OutOrStdout(), " error=%s", c.Error)
}
fmt.Fprintln(cmd.OutOrStdout())
}
}
if report.Failed > 0 {
return fmt.Errorf("smoke verification completed with failures")
}
return nil
}
+39 -9
View File
@@ -8,8 +8,9 @@ storage:
docs_dir: ./devour_data/docs docs_dir: ./devour_data/docs
index_dir: ./devour_data/index index_dir: ./devour_data/index
metadata_dir: ./devour_data/metadata metadata_dir: ./devour_data/metadata
cache_dir: ./devour_data/cache
# Embedding settings # Embedding settings (optional for lexical search; required for future embedding flows)
embeddings: embeddings:
provider: openai # openai, mock provider: openai # openai, mock
model: text-embedding-3-small model: text-embedding-3-small
@@ -17,7 +18,7 @@ embeddings:
api_key: ${OPENAI_API_KEY} # Use environment variable api_key: ${OPENAI_API_KEY} # Use environment variable
batch_size: 100 batch_size: 100
# Vector database # Vector database (optional in current local-first pipeline)
vector_db: vector_db:
type: memory # memory, chromem type: memory # memory, chromem
persist: true persist: true
@@ -28,7 +29,7 @@ scraper:
user_agent: "Devour/1.0 (+https://github.com/yourorg/devour)" user_agent: "Devour/1.0 (+https://github.com/yourorg/devour)"
timeout: 30s timeout: 30s
retry_count: 3 retry_count: 3
retry_delay: 5s retry_delay: 1s
concurrency: 10 concurrency: 10
rate_limit: 500ms rate_limit: 500ms
max_depth: 3 max_depth: 3
@@ -44,9 +45,22 @@ scheduler:
# Server settings # Server settings
server: server:
mode: local # local, remote mode: local # local, remote
transport: stdio # stdio, http
port: 8080 port: 8080
host: localhost host: localhost
# Local lexical indexing defaults
indexing:
enabled: true
auto_reindex: true
snippet_length: 220
max_docs: 10000
# Verification defaults
verification:
enabled: true
timeout: 90s
# Example sources # Example sources
sources: sources:
# Web documentation # Web documentation
@@ -67,17 +81,14 @@ sources:
url: https://api.example.com/openapi.json url: https://api.example.com/openapi.json
schedule: 168h # Weekly schedule: 168h # Weekly
# GitHub repository # GitHub repository docs
- name: github-repo - name: github-repo
type: github type: github
repo: org/repository repo: org/repository
branch: main branch: main
include: include:
- "docs/.*" - "(?i)(^|/)README\\.md$"
- "README.md" - "(?i)(^|/)docs?/"
exclude:
- "docs/internal/.*"
# auth_token: ${GITHUB_TOKEN} # Optional for private repos
# Local files # Local files
- name: local-docs - name: local-docs
@@ -86,3 +97,22 @@ sources:
include: include:
- ".*\\.md" - ".*\\.md"
- ".*\\.txt" - ".*\\.txt"
# Self-hosted search API (e.g. SearxNG) with no API key
- name: local-searxng-go
type: localsearch
url: http://127.0.0.1:8080/search
query: golang http client
result_limit: 8
domains:
- pkg.go.dev
- go.dev
# New framework examples
- name: nextjs-docs
type: url
url: https://nextjs.org/docs
- name: express-docs
type: url
url: https://expressjs.com/en/guide/routing.html
+11
View File
@@ -40,9 +40,20 @@ scheduler:
# Server settings # Server settings
server: server:
mode: local mode: local
transport: stdio
port: 8080 port: 8080
host: localhost host: localhost
indexing:
enabled: true
auto_reindex: true
snippet_length: 220
max_docs: 10000
verification:
enabled: true
timeout: 90s
# Sources (add your own) # Sources (add your own)
sources: [] sources: []
# - name: example-docs # - name: example-docs
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

Some files were not shown because too many files have changed in this diff Show More