Log in Start free
Docs / Crawl / Start a crawl

Start a crawl

Crawls are the unit of work in Ollagraph. They fetch one or many URLs, render them through a renderer of your choice, optionally extract structured data, and emit traceable events you can subscribe to.

POST https://api.ollagraph.com/v1/crawl
Live tip. Crawls are asynchronous by default. Set async: false to wait inline for small jobs, or stream results from the returned job_id over SSE.

Request body

All fields are JSON. Required fields are marked. Unknown fields are rejected with 400 invalid_request.

FieldTypeDescription
url
required
string · url
The starting URL. Must include scheme. http auto-upgrades to https when available.
depth
optional
integer · 0–8
How many link-hops deep to follow from url. Default 1. Set to 0 for the page only.
renderer
optional
"static" | "chromium" | "chromium-stealth"
Render backend. chromium-stealth uses our anti-bot pool; counts as 4 credits per page.
extract
optional
ExtractSpec
Run structured extraction over each fetched page. See Run extractor.
observe
optional
ObserveSpec
Emit traces & logs, with optional field redaction.
webhook
optional
string · url
Signed webhook fired on every status change. Verify signatures.
idempotency_key
optional
string · ≤ 64 chars
Repeats with the same key return the same job_id for 24h.

Example

This example crawls the front page of Hacker News with depth 3, renders through our stealth Chromium pool, extracts each story into a typed record, and emits a trace. The grader checks each extracted item for schema validity and confidence.

Response

Returns a job_id and a streamable URL. Stream events include queued, started, page, extracted, and completed.

Errors

402 quota_exceeded when you've used your credits. 429 rate_limited when you exceed your plan's RPS. 422 robots_disallow when the target's robots.txt forbids the path (override with respect_robots: false on enterprise plans only).

REQUEST
import { Ollagraph } from "@ollagraph/sdk";
const og = new Ollagraph({ apiKey: OG_KEY });

const job = await og.crawl.start({
  url:      "https://news.ycombinator.com",
  depth:    3,
  renderer: "chromium-stealth",
  extract: {
    schema: {
      title:  "string",
      url:    "url",
      points: "int",
    },
    grader: "claude-haiku-4.5",
  },
  observe: { trace: true },
});
200 OK response · application/json 217 ms
{
  "job_id":   "job_8431f4a9",
  "status":   "queued",
  "stream":   "wss://api.ollagraph.com/v1/jobs/job_8431f4a9",
  "trace":    "tr_a14b2c",
  "credits":  4,
  "created":  "2026-05-24T14:02:17Z"
}

Streaming results

For long crawls, subscribe to the WebSocket URL returned in stream. Each event is a JSON line. The SDKs expose this as an async iterator.

stream events application/x-ndjson
{"phase":"page","url":"https://news.ycombinator.com/","status":200,"latencyMs":412}
{"phase":"extracted","records":28,"grader":"claude-haiku-4.5","costUsd":0.0014}
{"phase":"page","url":"https://news.ycombinator.com/item?id=39112842","status":200,"latencyMs":388}
{"phase":"completed","jobId":"job_8431f4a9","pages":68,"records":1812}

Idempotency & retries

Pass an Idempotency-Key header (or idempotency_key in the body) to make a crawl creation safe to retry. We store the first response for 24 hours.

Webhooks

If you set webhook we sign every delivery with Ollagraph-Signature: t=<ts>,v1=<hmac>. Verify before acting on the payload — see Verify webhook signatures.

← Previous Errors Next → Get crawl status