Clean, LLM-ready web content
from any URL.

webextract is an API that turns messy web pages into clean markdown — navigation, ads, cookie banners, and boilerplate stripped — plus structured metadata and resolved links. One call. Built for RAG pipelines and AI agents.

Get an API key See how it works

One request in, clean content out

Request

POST /extract
Content-Type: application/json

{
  "url": "https://example.com/article",
  "formats": ["markdown"],
  "includeMetadata": true
}

Response

{
  "title": "The Real Headline",
  "byline": "Jane Doe",
  "wordCount": 1009,
  "markdown": "Clean article text as\nmarkdown, ready for your model…",
  "metadata": {
    "siteName": "Example News",
    "lang": "en",
    "canonical": "https://example.com/article"
  }
}

What you get

Readability-grade markdown

Main-content extraction via Mozilla Readability, converted to clean markdown your model can actually use.

Structured metadata

Title, byline, description, OpenGraph image, language, canonical URL, and favicon — parsed for you.

Batch mode

Extract up to 20 URLs in a single call. One bad URL never fails the whole batch.

Selector targeting

Pass a CSS selector to narrow extraction to exactly the region you care about.

SSRF-hardened

Private, loopback, and cloud-metadata addresses are refused — re-checked on every redirect hop.

Predictable & fast

Timeouts, size caps, and clean JSON errors. Priced for high-volume RAG and agent workloads.

Pricing

Basic
$0/mo
100 requests / mo
Contact us
Ultra
$29/mo
50,000 requests / mo
Subscribe
Mega
$99/mo
250,000 requests / mo
Subscribe

Per-request overage beyond plan quota. Cancel anytime.

FAQ

What does webextract sell?
A metered HTTP API. You send a URL; we return clean markdown and structured metadata for that page. Billing is per monthly request quota.
Who is it for?
Developers building retrieval-augmented generation (RAG) systems, AI agents, research tools, and content pipelines that need clean text from arbitrary web pages.
How do I get access?
Email vere@kaylie.ai for a direct API key, or subscribe via our API marketplace listing.
Do you store the pages I extract?
No. Pages are fetched, processed, and returned in the response. We do not retain page content.