Skip to content

Roadmap

This roadmap is intentionally narrow. Each milestone ships only what is already designed and testable offline. Anything that would require a live API, vendor SDK, or new runtime dependency stays in 0.4 or later, behind an explicit opt-in.

The roadmap is a direction document, not a release schedule. Items will land when they are ready and reviewed.

0.1 — current

Initial OSS extraction with the offline core. Shipped:

  • Local site inventory — read URLs from a CSV or JSON, normalise, classify each page (home, service, blog, category, landing, other), record the rule that fired.
  • Internal link graph — join an edge list with the inventory and flag commercial pages that receive no inlinks from blog posts and blog posts that receive at most one inlink.
  • Agent context pack — JSON + Markdown digest aggregating inventory, link graph, project notes, and any provider artifacts.
  • Local CSV keyword provider (local-csv) — import keyword metrics from any CSV with query/keyword plus optional volume, competition, geo, language, source URL columns. Tolerant of header variations (Search Volumesearch_volume) and number formats ("1,234"1234, "12.3%"0.123).
  • Local Search-Console-style CSV provider (local-gsc-csv) — import per-query performance from a Google Search Console Performance export.
  • Provider registryKEYWORD_PROVIDERS and SEARCH_PERFORMANCE_PROVIDERS maps, plus stable error types (ProviderError, ProviderConfigurationError, ProviderNotConfiguredError).
  • CLI commandsinit, build-inventory, build-link-graph, import-keywords, import-search-performance, list-providers, build-context-pack, inspect.
  • Tests and CI — 50 tests on Python 3.11/3.12 via GitHub Actions (ruff + pytest). Zero network access required.

0.2 — input adapters and configurable classification

Goal: make it easy to feed data in without hand-rolling CSVs.

  • Sitemap XML importer(merged in Unreleased) — read one or more sitemap.xml (and sitemap-index) files into the inventory CSV format. Offline; no HTTP fetching.
  • Screaming Frog CSV importer(merged in Unreleased) — accept the canonical Screaming Frog internal_*.csv and *_inlinks/*_outlinks.csv exports directly, without a manual reshape step. Tolerates the legacy column layout (SF v15-17).
  • Stronger configurable page classification(merged in Unreleased)config/classifier.json now supports priorities, negation patterns, and explicit URL allow-lists per page type. See docs/classifier.md.
  • Improved context-pack templates — make the Markdown sections customisable via simple template strings in config/context_pack.json. Defaults remain offline-safe and vendor-neutral.

0.3 — search evidence and offline QA

Goal: capture what the SERP looks like (without scraping it) and provide deterministic checks over Markdown drafts.

  • Search evidence provider interface(merged in Unreleased) — finalised the SearchEvidenceProvider abstract base.
  • Local SERP evidence CSV importer(merged in Unreleased) — read top-N organic rows for a query from a hand-curated CSV. No live SERP scraping.
  • Deterministic content QA module(merged in Unreleased) — offline checks over a Markdown draft (keyphrase distribution, internal-link sanity, slug shape, heading hierarchy, missing alt text). No LLM involvement.

0.4 — optional live adapters

Goal: bridge to real APIs without making them required. Each item ships behind an optional extra (pip install site-context-pipeline[<extra>]); credentials live in env or local .env files and never in artifacts.

  • Optional live Google Ads Keyword Planner adapter(landed) — replaces the google-ads stub with a real KeywordPlanIdeaService call behind the [google-ads] extra. Returns not_configured when the extra or credentials are missing; never imports the SDK at module load.
  • Optional live Google Search Console adapter(landed) — replaces the google-search-console stub with a real Search Analytics API call behind the [gsc] extra. Returns not_configured when the extra or credentials are missing; never imports the client libraries at module load.
  • Optional third-party / regional keyword providers — community adapters for the data sources their authors use. The toolkit will accept a Yandex Wordstat adapter, a DataForSEO adapter, an Ahrefs adapter, a Semrush adapter, a SerpApi adapter, etc., on the same terms as Google: optional extra, structured not_configured when unconfigured, no impact on the base install.
  • Provider config docs — per-provider configuration schemas, credential setup walk-throughs, rate-limit guidance.

Items intentionally not on the roadmap

  • A built-in "write me an article" command.
  • Bulk content generation across many sites in one run.
  • Anything that touches a live site without an explicit opt-in flag.
  • Hardcoded support for any single search vendor in the core. Vendors are providers; providers are optional.
  • Background daemons, schedulers, or always-on services. The toolkit is and will remain a CLI you invoke from a script or a CI job.