Docs / Crawl / Start a crawl

Start a crawl

Crawls are the unit of work in Ollagraph. They fetch one or many URLs, render them through a renderer of your choice, optionally extract structured data, and emit traceable events you can subscribe to.

POST https://api.ollagraph.com/v1/crawl

Live tip. Crawls are asynchronous by default. Set async: false to wait inline for small jobs, or stream results from the returned job_id over SSE.

Request body

All fields are JSON. Required fields are marked. Unknown fields are rejected with 400 invalid_request.

Field	Type	Description
url required	string · url	The starting URL. Must include scheme. http auto-upgrades to https when available.
depth optional	integer · 0–8	How many link-hops deep to follow from url. Default 1. Set to 0 for the page only.
renderer optional	"static" \| "chromium" \| "chromium-stealth"	Render backend. chromium-stealth uses our anti-bot pool; counts as 4 credits per page.
extract optional	ExtractSpec	Run structured extraction over each fetched page. See Run extractor.
observe optional	ObserveSpec	Emit traces & logs, with optional field redaction.
webhook optional	string · url	Signed webhook fired on every status change. Verify signatures.
idempotency_key optional	string · ≤ 64 chars	Repeats with the same key return the same job_id for 24h.

Example

This example crawls the front page of Hacker News with depth 3, renders through our stealth Chromium pool, extracts each story into a typed record, and emits a trace. The grader checks each extracted item for schema validity and confidence.

Response

Returns a job_id and a streamable URL. Stream events include queued, started, page, extracted, and completed.

Errors

402 quota_exceeded when you've used your credits. 429 rate_limited when you exceed your plan's RPS. 422 robots_disallow when the target's robots.txt forbids the path (override with respect_robots: false on enterprise plans only).

REQUEST

import { Ollagraph } from "@ollagraph/sdk";
const og = new Ollagraph({ apiKey: OG_KEY });

const job = await og.crawl.start({
  url:      "https://news.ycombinator.com",
  depth:    3,
  renderer: "chromium-stealth",
  extract: {
    schema: {
      title:  "string",
      url:    "url",
      points: "int",
    },
    grader: "claude-haiku-4.5",
  },
  observe: { trace: true },
});

from ollagraph import Ollagraph
og = Ollagraph(api_key=OG_KEY)

job = og.crawl.start(
    url="https://news.ycombinator.com",
    depth=3,
    renderer="chromium-stealth",
    extract={
        "schema": {
            "title":  "string",
            "url":    "url",
            "points": "int",
        },
        "grader": "claude-haiku-4.5",
    },
    observe={"trace": True},
)

curl https://api.ollagraph.com/v1/crawl \
  -H "Authorization: Bearer $OG_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://news.ycombinator.com",
    "depth": 3,
    "renderer": "chromium-stealth",
    "extract": {
      "schema": {
        "title":  "string",
        "url":    "url",
        "points": "int"
      },
      "grader": "claude-haiku-4.5"
    },
    "observe": { "trace": true }
  }'

200 OK response · application/json 217 ms

{
  "job_id":   "job_8431f4a9",
  "status":   "queued",
  "stream":   "wss://api.ollagraph.com/v1/jobs/job_8431f4a9",
  "trace":    "tr_a14b2c",
  "credits":  4,
  "created":  "2026-05-24T14:02:17Z"
}

Streaming results

For long crawls, subscribe to the WebSocket URL returned in stream. Each event is a JSON line. The SDKs expose this as an async iterator.

stream events application/x-ndjson

{"phase":"page","url":"https://news.ycombinator.com/","status":200,"latencyMs":412}
{"phase":"extracted","records":28,"grader":"claude-haiku-4.5","costUsd":0.0014}
{"phase":"page","url":"https://news.ycombinator.com/item?id=39112842","status":200,"latencyMs":388}
{"phase":"completed","jobId":"job_8431f4a9","pages":68,"records":1812}

Idempotency & retries

Pass an Idempotency-Key header (or idempotency_key in the body) to make a crawl creation safe to retry. We store the first response for 24 hours.

Webhooks

If you set webhook we sign every delivery with Ollagraph-Signature: t=<ts>,v1=<hmac>. Verify before acting on the payload — see Verify webhook signatures.

← Previous Errors Next → Get crawl status