{ "batch": "Abstractions & Dependencies", "batch_index": 2, "assessments": { "abstraction_fitness": 72.0 }, "dimension_notes": { "abstraction_fitness": { "evidence": [ "Four external scrapers (Astro, Docker, Cloudflare, Nuxt) each reimplement the same transport/change-detection skeleton (`fetchPage`, `DetectChanges`, `generateHash`, per-page `Document` assembly) with only parser/model differences.", "`cmd/serve.go` RPC path for `devour_scrape` mutates CLI globals (`scrapeFormat`, `scrapeOutput`, `scrapeAllowEmpty`) to call `scrapeOne` from `cmd/scrape.go`, showing a leaky abstraction where server flow depends on CLI stateful wiring.", "`vector.Store` advertises interchangeable backends, but `NewStore` can return `ChromemStore` whose methods are all unimplemented runtime errors, so the abstraction surface overstates usable implementations." ], "impact_scope": "subsystem", "fix_scope": "multi_file_refactor", "confidence": "high", "sub_axes": { "abstraction_leverage": 68.0, "indirection_cost": 73.0, "interface_honesty": 70.0 } } }, "findings": [ { "dimension": "abstraction_fitness", "identifier": "duplicated_external_scraper_skeleton", "summary": "External docs scrapers duplicate the same orchestration instead of sharing one adapter base", "related_files": [ "internal/scraper/external/astrodocs.go", "internal/scraper/external/cloudflaredocs.go", "internal/scraper/external/dockerdocs.go", "internal/scraper/external/nuxtdocs.go" ], "evidence": [ "Each scraper defines near-identical `fetchPage`, `DetectChanges`, and `generateHash` logic.", "Each `Scrape` method repeats the same flow: validate URL -> fetch HTML -> parser call -> append main doc + sub-doc loop(s).", "Differences are mostly parser/model-specific mapping, but transport/error/hash logic is copy-pasted." ], "suggestion": "Introduce a shared docs-scraper base (or helper pipeline) for HTTP fetch + hashing + standard error handling, and keep only parser-specific mapping in per-provider adapters.", "confidence": "high", "impact_scope": "subsystem", "fix_scope": "multi_file_refactor" }, { "dimension": "abstraction_fitness", "identifier": "scrape_api_leaks_cli_state", "summary": "Server scrape RPC depends on mutable CLI globals to reuse scrape pipeline", "related_files": [ "cmd/serve.go", "cmd/scrape.go", "cmd/get.go" ], "evidence": [ "`handleServeMethod` temporarily rewrites `scrapeFormat`, `scrapeOutput`, and `scrapeAllowEmpty` before calling `scrapeOne`, then restores them.", "`scrapeOne` is not a pure service API; it is coupled to CLI-level shared state and output behavior.", "`cmd/get.go` also routes through `runScrape`, reinforcing that scraping orchestration is command-centric rather than a reusable application service." ], "suggestion": "Extract a stateless scrape service function (explicit request struct + options) used by both CLI commands and RPC handlers; keep CLI flags as translation-only at command boundary.", "confidence": "high", "impact_scope": "subsystem", "fix_scope": "architectural_change" }, { "dimension": "abstraction_fitness", "identifier": "vector_store_interface_overpromises", "summary": "Vector store abstraction exposes backends that are selectable but not actually implemented", "related_files": [ "internal/vector/store.go", "internal/indexer/indexer.go" ], "evidence": [ "`NewStore` can return `ChromemStore` when config type is `chromem`.", "All `ChromemStore` interface methods currently return `not implemented` errors.", "`Indexer` depends only on `vector.Store`, so backend failure appears at runtime after abstraction selection instead of at wiring/validation time." ], "suggestion": "Make backend capabilities explicit: either remove/guard `chromem` selection until complete, or return an initialization error type from store construction and enforce backend readiness before indexing starts.", "confidence": "high", "impact_scope": "module", "fix_scope": "multi_file_refactor" } ] }