Create job
Create and start a web crawler job.
Crawler jobs may take several minutes to complete. Use the Get job endpoint to check the status of a job, and fetch the results from the Get job data endpoint when the job is complete.
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Body
Array of one or more website URLs to crawl
Globs (https://developer.mozilla.org/en-US/docs/Web/API/URL_Pattern_API) to exclude page URLs from being crawled
CSS selectors of content to exclude from page html before converting to output format (separate multiple selectors with commas)
Format to save all crawled page content to
text
, html
, markdown
Time in seconds to store crawler output for, after which it will be automatically deleted (default and max value is 604800 which is 7 days)
x < 604800
Skip any page that has less than the minimum number of characters in the output (default 50 chars)
Maximum number of pages to crawl (limited to 10,000 pages on the free plan)
0 < x < 500000
Block loading of images, stylesheets, and scripts to speed up crawling
Include linked files (e.g. PDFs, images) in the output as URLs
Webhook to call with updates about job progress
Force crawling mode to use sitemap or link crawling
sitemap
, link