Get data from web pages automatically

Diffbot's computer vision APIs turn the web into your database.

Test drive our APIs


Extract Automatically

Get content from articles, products, images and other familiar page types.

The Article API is used to extract clean article text and related data from news articles and blog posts. Retrieve complete text, normalized HTML, related images and videos, author, date, tags—all automatically, from any article on any site.

Test drive Documentation

The Image API identifies the primary image(s) of a submitted web page and returns comprehensive image information.

Test drive Documentation

The Product API extracts complete data from any shopping or e-commerce product page—automatically. Retrieve full pricing information, product IDs (SKU, UPC, MPN), images, product specifications, brand and more.

Test drive Documentation

Not sure what you're dealing with? Analyze determines the "page-type" of any submitted URL, and automatically structures the data of supported page-types. When crawling, Analyze will detect product, article or other pages on-the-fly, allowing you to crawl without rules or regexes.

Test drive Documentation


Crawl Entire Sites

Automatically spider and extract the data from entire web sites.

I was able to use Crawlbot's API to automate extraction from over 500 sites for my dissertation. Combined with the Analyze API it made the Herculean task of data collection infinitely easier and faster. I was able to process the text of over 250,000 blog posts in only a month!

Matt Burton University of Michigan


Extract Anything

Manually extract any type of data using the Custom API Toolkit.

SoundGecko users want to listen to everything, and Diffbot's Custom API Toolkit helps make anything extractable, whether it's an article or not. Our forthcoming app Aloud will leverage Diffbot's custom API flexibility and responsiveness even more to meet the growing needs of our users.

Ed Hooper SoundGecko

Powered by Diffbot


Adobe Article API Custom API

eBay Product API Bulk API

Reverb Article API Analyze API

Zola Product API

Instapaper Article API

Cisco Crawlbot Custom API Discussion API

FindTheBest Article API Bulk API Crawlbot Custom API

Onswipe Article API Custom API