POST
/
crawler
/
jobs
curl --request POST \
  --url https://api.usescraper.com/crawler/jobs \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "exclude_elements": "<string>",
  "exclude_globs": [
    "<string>"
  ],
  "min_length": 123,
  "output_expiry": 123,
  "output_format": "text",
  "page_limit": 123,
  "urls": [
    "<string>"
  ],
  "webhook_url": "<string>"
}'
{
    "id": "7YEGS3M8Q2JD6TNMEJB8B6EKVS",
    "urls": [
        "https://example.com"
    ],
    "createdAt": 1699964378397,
    "status": "starting",
    "sitemapPageCount": 0,
    "progress": {
        "scraped": 0,
        "discarded": 0,
        "failed": 0
    },
    "costCents": 0,
    "webhookFails": []
}

Crawler jobs may take several minutes to complete. Use the Get job endpoint to check the status of a job, and fetch the results from the Get job data endpoint when the job is complete.

Authorizations

Authorization
string
headerrequired

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
exclude_elements
string
default: nav, header, footer, script, style, noscript, svg, [role="alert"], [role="banner"], [role="dialog"], [role="alertdialog"], [role="region"][aria-label*="skip" i], [aria-modal="true"]

CSS selectors of content to exclude from page html before converting to output format (separate multiple selectors with commas)

exclude_globs
string[]

Globs (https://developer.mozilla.org/en-US/docs/Web/API/URL_Pattern_API) to exclude page URLs from being crawled

min_length
number
default: 50

Skip any page that has less than the minimum number of characters in the output (default 50 chars)

output_expiry
number
default: 604800

Time in seconds to store crawler output for, after which it will be automatically deleted (default and max value is 604800 which is 7 days)

output_format
enum<string>
default: text

Format to save all crawled page content to

Available options:
text,
html,
markdown
page_limit
number
default: 10000

Maximum number of pages to crawl (limited to 10,000 pages on the free plan)

urls
string[]
required

Array of one or more website URLs to crawl

webhook_url
string

Webhook to call with updates about job progress