mirror of
https://github.com/Dvorinka/Devour.git
synced 2026-06-03 12:03:06 +00:00
83 lines
4.3 KiB
Plaintext
83 lines
4.3 KiB
Plaintext
{
|
|
"batch": "Abstractions & Dependencies",
|
|
"batch_index": 2,
|
|
"assessments": {
|
|
"abstraction_fitness": 72.0
|
|
},
|
|
"dimension_notes": {
|
|
"abstraction_fitness": {
|
|
"evidence": [
|
|
"Four external scrapers (Astro, Docker, Cloudflare, Nuxt) each reimplement the same transport/change-detection skeleton (`fetchPage`, `DetectChanges`, `generateHash`, per-page `Document` assembly) with only parser/model differences.",
|
|
"`cmd/serve.go` RPC path for `devour_scrape` mutates CLI globals (`scrapeFormat`, `scrapeOutput`, `scrapeAllowEmpty`) to call `scrapeOne` from `cmd/scrape.go`, showing a leaky abstraction where server flow depends on CLI stateful wiring.",
|
|
"`vector.Store` advertises interchangeable backends, but `NewStore` can return `ChromemStore` whose methods are all unimplemented runtime errors, so the abstraction surface overstates usable implementations."
|
|
],
|
|
"impact_scope": "subsystem",
|
|
"fix_scope": "multi_file_refactor",
|
|
"confidence": "high",
|
|
"sub_axes": {
|
|
"abstraction_leverage": 68.0,
|
|
"indirection_cost": 73.0,
|
|
"interface_honesty": 70.0
|
|
}
|
|
}
|
|
},
|
|
"findings": [
|
|
{
|
|
"dimension": "abstraction_fitness",
|
|
"identifier": "duplicated_external_scraper_skeleton",
|
|
"summary": "External docs scrapers duplicate the same orchestration instead of sharing one adapter base",
|
|
"related_files": [
|
|
"internal/scraper/external/astrodocs.go",
|
|
"internal/scraper/external/cloudflaredocs.go",
|
|
"internal/scraper/external/dockerdocs.go",
|
|
"internal/scraper/external/nuxtdocs.go"
|
|
],
|
|
"evidence": [
|
|
"Each scraper defines near-identical `fetchPage`, `DetectChanges`, and `generateHash` logic.",
|
|
"Each `Scrape` method repeats the same flow: validate URL -> fetch HTML -> parser call -> append main doc + sub-doc loop(s).",
|
|
"Differences are mostly parser/model-specific mapping, but transport/error/hash logic is copy-pasted."
|
|
],
|
|
"suggestion": "Introduce a shared docs-scraper base (or helper pipeline) for HTTP fetch + hashing + standard error handling, and keep only parser-specific mapping in per-provider adapters.",
|
|
"confidence": "high",
|
|
"impact_scope": "subsystem",
|
|
"fix_scope": "multi_file_refactor"
|
|
},
|
|
{
|
|
"dimension": "abstraction_fitness",
|
|
"identifier": "scrape_api_leaks_cli_state",
|
|
"summary": "Server scrape RPC depends on mutable CLI globals to reuse scrape pipeline",
|
|
"related_files": [
|
|
"cmd/serve.go",
|
|
"cmd/scrape.go",
|
|
"cmd/get.go"
|
|
],
|
|
"evidence": [
|
|
"`handleServeMethod` temporarily rewrites `scrapeFormat`, `scrapeOutput`, and `scrapeAllowEmpty` before calling `scrapeOne`, then restores them.",
|
|
"`scrapeOne` is not a pure service API; it is coupled to CLI-level shared state and output behavior.",
|
|
"`cmd/get.go` also routes through `runScrape`, reinforcing that scraping orchestration is command-centric rather than a reusable application service."
|
|
],
|
|
"suggestion": "Extract a stateless scrape service function (explicit request struct + options) used by both CLI commands and RPC handlers; keep CLI flags as translation-only at command boundary.",
|
|
"confidence": "high",
|
|
"impact_scope": "subsystem",
|
|
"fix_scope": "architectural_change"
|
|
},
|
|
{
|
|
"dimension": "abstraction_fitness",
|
|
"identifier": "vector_store_interface_overpromises",
|
|
"summary": "Vector store abstraction exposes backends that are selectable but not actually implemented",
|
|
"related_files": [
|
|
"internal/vector/store.go",
|
|
"internal/indexer/indexer.go"
|
|
],
|
|
"evidence": [
|
|
"`NewStore` can return `ChromemStore` when config type is `chromem`.",
|
|
"All `ChromemStore` interface methods currently return `not implemented` errors.",
|
|
"`Indexer` depends only on `vector.Store`, so backend failure appears at runtime after abstraction selection instead of at wiring/validation time."
|
|
],
|
|
"suggestion": "Make backend capabilities explicit: either remove/guard `chromem` selection until complete, or return an initialization error type from store construction and enforce backend readiness before indexing starts.",
|
|
"confidence": "high",
|
|
"impact_scope": "module",
|
|
"fix_scope": "multi_file_refactor"
|
|
}
|
|
]
|
|
} |