boilerplate-free-markdown-conversion

datatransform

RenderedHtml -> CleanMarkdown

Parse raw HTML fragments and strip site-wide navigational shell layouts to extract the core documentation article as Markdown.

Problem it solves

Scraping full pages retains noisy sidebars, headers, and footers which pollute embedding models and lower retrieval accuracy.

Consumes

RenderedHtml

Emits

CleanMarkdown

Distilled from 1 source

The real projects this mechanism was found in. Attribution is the point — this is how the best teams actually do it.