Let’s get started with a quick example of how to crawl all the pages on a website to markdown. We’ll create this crawling job in the UseScraper dashboard. If you prefer start directly using API, check out the API Overview to see how to create jobs via the API.
Head to the UseScraper dashboard now (sign up for a free account if you haven’t done so already) and click “New Crawl Job”.
You can now enter one or more URLs of websites to crawl all the pages of. We’ll crawl the UseScraper homepage today. Enter ”https://usescraper.com” in the URLs field.
You can leave the rest of the fields as they are. If you want to learn more about all the options available, visit the Create job docs.
Click “Start Job” to start the web crawling job.
Our web crawler will now crawl all the pages on the website and save the page content as markdown. You can watch the progress on the job view page, which will automatically update every few seconds.
Once the crawling job is complete, you can view the results for every page crawled. Click “View” next to any page to preview the markdown content.
There are several ways to download the results:
- Click “Download JSON” to get all the results, including page metadata and markdown content, as a single JSON file.
- Click “Download Markdown” to get all results as a single concatenated markdown file. This is useful for uploading into OpenAI GPT Knowledge or similar.
- Use the /jobs/:id/data API endpoint to get the JSON results programmatically.
Congratulations, you’ve just crawled your first website! UseScraper has a lot more functionality, and the ability to crawl websites with thousands or even millions of pages. You can keep using UseScraper via the dashboard UI. Or if you want to integrate UseScraper into your own application, continue to the API Overview to see how you can create and manage jobs via the API.