Roadmap¶
This roadmap is intentionally narrow. Each milestone ships only what is already designed and testable offline. Anything that would require a live API, vendor SDK, or new runtime dependency stays in 0.4 or later, behind an explicit opt-in.
The roadmap is a direction document, not a release schedule. Items will land when they are ready and reviewed.
0.1 — current¶
Initial OSS extraction with the offline core. Shipped:
- Local site inventory — read URLs from a CSV or JSON, normalise,
classify each page (
home,service,blog,category,landing,other), record the rule that fired. - Internal link graph — join an edge list with the inventory and flag commercial pages that receive no inlinks from blog posts and blog posts that receive at most one inlink.
- Agent context pack — JSON + Markdown digest aggregating inventory, link graph, project notes, and any provider artifacts.
- Local CSV keyword provider (
local-csv) — import keyword metrics from any CSV withquery/keywordplus optional volume, competition, geo, language, source URL columns. Tolerant of header variations (Search Volume≡search_volume) and number formats ("1,234"→1234,"12.3%"→0.123). - Local Search-Console-style CSV provider (
local-gsc-csv) — import per-query performance from a Google Search Console Performance export. - Provider registry —
KEYWORD_PROVIDERSandSEARCH_PERFORMANCE_PROVIDERSmaps, plus stable error types (ProviderError,ProviderConfigurationError,ProviderNotConfiguredError). - CLI commands —
init,build-inventory,build-link-graph,import-keywords,import-search-performance,list-providers,build-context-pack,inspect. - Tests and CI — 50 tests on Python 3.11/3.12 via GitHub Actions (ruff + pytest). Zero network access required.
0.2 — input adapters and configurable classification¶
Goal: make it easy to feed data in without hand-rolling CSVs.
- Sitemap XML importer ✅ (merged in
Unreleased) — read one or moresitemap.xml(and sitemap-index) files into the inventory CSV format. Offline; no HTTP fetching. - Screaming Frog CSV importer ✅ (merged in
Unreleased) — accept the canonical Screaming Froginternal_*.csvand*_inlinks/*_outlinks.csvexports directly, without a manual reshape step. Tolerates the legacy column layout (SF v15-17). - Stronger configurable page classification ✅ (merged in
Unreleased) —config/classifier.jsonnow supports priorities, negation patterns, and explicit URL allow-lists per page type. Seedocs/classifier.md. - Improved context-pack templates — make the Markdown sections
customisable via simple template strings in
config/context_pack.json. Defaults remain offline-safe and vendor-neutral.
0.3 — search evidence and offline QA¶
Goal: capture what the SERP looks like (without scraping it) and provide deterministic checks over Markdown drafts.
- Search evidence provider interface ✅ (merged in
Unreleased) — finalised theSearchEvidenceProviderabstract base. - Local SERP evidence CSV importer ✅ (merged in
Unreleased) — read top-N organic rows for a query from a hand-curated CSV. No live SERP scraping. - Deterministic content QA module ✅ (merged in
Unreleased) — offline checks over a Markdown draft (keyphrase distribution, internal-link sanity, slug shape, heading hierarchy, missing alt text). No LLM involvement.
0.4 — optional live adapters¶
Goal: bridge to real APIs without making them required. Each item
ships behind an optional extra (pip install
site-context-pipeline[<extra>]); credentials live in env or local
.env files and never in artifacts.
- Optional live Google Ads Keyword Planner adapter ✅ (landed) —
replaces the
google-adsstub with a realKeywordPlanIdeaServicecall behind the[google-ads]extra. Returnsnot_configuredwhen the extra or credentials are missing; never imports the SDK at module load. - Optional live Google Search Console adapter ✅ (landed) —
replaces the
google-search-consolestub with a real Search Analytics API call behind the[gsc]extra. Returnsnot_configuredwhen the extra or credentials are missing; never imports the client libraries at module load. - Optional third-party / regional keyword providers — community
adapters for the data sources their authors use. The toolkit will
accept a Yandex Wordstat adapter, a DataForSEO adapter, an Ahrefs
adapter, a Semrush adapter, a SerpApi adapter, etc., on the same
terms as Google: optional extra, structured
not_configuredwhen unconfigured, no impact on the base install. - Provider config docs — per-provider configuration schemas, credential setup walk-throughs, rate-limit guidance.
Items intentionally not on the roadmap¶
- A built-in "write me an article" command.
- Bulk content generation across many sites in one run.
- Anything that touches a live site without an explicit opt-in flag.
- Hardcoded support for any single search vendor in the core. Vendors are providers; providers are optional.
- Background daemons, schedulers, or always-on services. The toolkit is and will remain a CLI you invoke from a script or a CI job.