Crawlbot is
smart spidering

Crawlbot uses any Diffbot API to extract data from entire sites. It creates a structured index of practically any site's data. Turn unstructured site-wide pages into a structured data store.

Schedule a Demo


Smart Processing

Use our Analyze API to automatically find and extract all article, product or other supported pages.

Structured Search

All crawls are instantly searchable using our Search API, allowing you to slice and dice your data by searching the Diffbot-structured fields. Sort by article date, filter by product price, search across your custom fields: it's all in there.

Wolf-Spider Fast

Distributed, world-class crawling infrastructure processing millions of pages daily.

Diversified IP Options

Utilize our reserved fleet of proxy IPs, optionally upgrade to gain access to tens of thousands of unique IPs for truly diversified crawling or region/country-specific extraction.

Automatic Or Custom APIs

Extract data using one of Diffbot’s rule-less Automatic Extraction APIs or control exactly what is returned from each crawled page with a Custom API for extraction.


Fine Tune Your Crawl

If you prefer knowbs and dials, carefully control which pages you crawl and extract.

Yup, it's got an API

Programmatically start crawls, check crawl statuses, and retrieve output using the Crawlbot API.

Works with Custom APIs

Pair a Custom API with Crawlbot to extract nearly anything from any site.

Crawl History

Re-run, copy, re-download or simply review your crawl history at any time.


Crawlbot Basics (4:56)

In this introductory Crawlbot video we work through how to set up a basic crawl to extract product data from across an ecommerce site.

Advanced Crawlbot (5:00)

In this video we look at some of the more advanced techniques available using Crawlbot, including crawling pages that are behind logins.

Let’s schedule a demo and talk about your data needs,
schedule a session, or learn more about Diffbot!

Contact Sales