Crawlbot is smart spidering

Crawlbot uses any Diffbot API to extract data from entire sites. Whether for product prices, historical weather, content migration or even three years of Hacker News archives, Crawlbot creates a structured index of practically any site's data.

Features

Smart Processing

Use our Analyze API to automatically find and extract all article, product or other supported pages.

Structured Search

All crawls are instantly searchable using our Search API, allowing you to slice and dice your data by searching the Diffbot-structured fields. Sort by article date, filter by product price, search across your custom fields: it's all in there.

Wolf-Spider Fast

Distributed, world-class crawling infrastructure processing millions of pages daily.

Diversified IP Options

Subscribers to our Plus or Professional plans can utilize our reserved fleet of proxy IPs, optionally upgrade to gain access to tens of thousands of unique IPs for truly diversified crawling or region/country-specific extraction.


Benefits

Plenty of crawl knobs and switches

Fine-Tune Your Crawl

If you prefer knobs and dials, carefully control which pages you crawl and extract.

API controllable

Yup, it's got an API

Programmatically start crawls, check status and retrieve output using the Crawlbot API.

Crawlbot works with our Custom APIs

Works with Custom APIs

Pair a custom API with Crawlbot to extract nearly anything from any site.

Crawl History

Re-run, copy, re-download or simply review your crawl history at any time.

Tutorials

Crawlbot Basics (3:29)

A quick overview of Crawlbot using the Analyze API to automatically identify and extract all products from an e-commerce site.

Advanced Tutorial (4:37)

Review some of the ways for narrowing your crawl within a site, and setting up a repeat or recurring crawl.