Start a crawl
Crawls are the unit of work in Ollagraph. They fetch one or many URLs, render them through a renderer of your choice, optionally extract structured data, and emit traceable events you can subscribe to.
async: false to wait inline for small jobs, or stream
results from the returned job_id over SSE.
Request body
All fields are JSON. Required fields are marked. Unknown fields are rejected with 400 invalid_request.
| Field | Type | Description |
|---|---|---|
url required |
string · url |
The starting URL. Must include scheme. http auto-upgrades to https when available. |
depth optional |
integer · 0–8 |
How many link-hops deep to follow from url. Default 1. Set to 0 for the page only. |
renderer optional |
"static" | "chromium" | "chromium-stealth" |
Render backend. chromium-stealth uses our anti-bot pool; counts as 4 credits per page. |
extract optional |
ExtractSpec |
Run structured extraction over each fetched page. See Run extractor. |
observe optional |
ObserveSpec |
Emit traces & logs, with optional field redaction. |
webhook optional |
string · url |
Signed webhook fired on every status change. Verify signatures. |
idempotency_key optional |
string · ≤ 64 chars |
Repeats with the same key return the same job_id for 24h. |
Example
This example crawls the front page of Hacker News with depth 3, renders through our stealth Chromium pool, extracts each story into a typed record, and emits a trace. The grader checks each extracted item for schema validity and confidence.
Response
Returns a job_id and a streamable URL. Stream events include queued, started, page, extracted, and completed.
Errors
402 quota_exceeded when you've used your credits. 429 rate_limited when you exceed your plan's RPS. 422 robots_disallow when the target's robots.txt forbids the path (override with respect_robots: false on enterprise plans only).
import { Ollagraph } from "@ollagraph/sdk";
const og = new Ollagraph({ apiKey: OG_KEY });
const job = await og.crawl.start({
url: "https://news.ycombinator.com",
depth: 3,
renderer: "chromium-stealth",
extract: {
schema: {
title: "string",
url: "url",
points: "int",
},
grader: "claude-haiku-4.5",
},
observe: { trace: true },
});
from ollagraph import Ollagraph
og = Ollagraph(api_key=OG_KEY)
job = og.crawl.start(
url="https://news.ycombinator.com",
depth=3,
renderer="chromium-stealth",
extract={
"schema": {
"title": "string",
"url": "url",
"points": "int",
},
"grader": "claude-haiku-4.5",
},
observe={"trace": True},
)
curl https://api.ollagraph.com/v1/crawl \
-H "Authorization: Bearer $OG_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://news.ycombinator.com",
"depth": 3,
"renderer": "chromium-stealth",
"extract": {
"schema": {
"title": "string",
"url": "url",
"points": "int"
},
"grader": "claude-haiku-4.5"
},
"observe": { "trace": true }
}'
{
"job_id": "job_8431f4a9",
"status": "queued",
"stream": "wss://api.ollagraph.com/v1/jobs/job_8431f4a9",
"trace": "tr_a14b2c",
"credits": 4,
"created": "2026-05-24T14:02:17Z"
}
Streaming results
For long crawls, subscribe to the WebSocket URL returned in stream. Each event is a JSON line. The SDKs expose this as an async iterator.
{"phase":"page","url":"https://news.ycombinator.com/","status":200,"latencyMs":412}
{"phase":"extracted","records":28,"grader":"claude-haiku-4.5","costUsd":0.0014}
{"phase":"page","url":"https://news.ycombinator.com/item?id=39112842","status":200,"latencyMs":388}
{"phase":"completed","jobId":"job_8431f4a9","pages":68,"records":1812}
Idempotency & retries
Pass an Idempotency-Key header (or idempotency_key in the body) to make a crawl creation safe to retry. We store the first response for 24 hours.
Webhooks
If you set webhook we sign every delivery with Ollagraph-Signature: t=<ts>,v1=<hmac>. Verify before acting on the payload — see Verify webhook signatures.