Get data from web pages automatically

Diffbot's computer vision APIs turn the web into your database.

Test drive our APIs

AUTOMATIC APIs

Extract Automatically

Get structured content from articles, products, discussions and other familiar page types.

The Article API is used to extract clean article text and related data from news articles and blog posts. Retrieve complete text, normalized HTML, related images and videos, author, date, tags—all automatically, from any article on any site.

Test drive Documentation

The Image API identifies the primary image(s) of a submitted web page and returns comprehensive image information.

Test drive Documentation

The Product API extracts complete data from any shopping or e-commerce product page—automatically. Retrieve full pricing information, product IDs (SKU, UPC, MPN), images, product specifications, brand and more.

Test drive Documentation

The Discussion API (beta) automatically structures and extracts entire threads or lists of reviews/comments from most discussion pages.

Test drive Documentation

The Video API (beta) automatically extracts detailed video information, including thumbnail, URL and embed code, from nearly any video platform on the web.

Test drive Documentation

Not sure what you're dealing with? Analyze determines the "page-type" of any submitted URL, and automatically structures the data of supported page-types. When crawling, Analyze will detect product, article or other pages on-the-fly, allowing you to crawl without rules or regexes.

Test drive Documentation

CRAWLBOT

Crawl Entire Sites

Automatically spider and extract the data from entire web sites.

I was able to use Crawlbot's API to automate extraction from over 500 sites for my dissertation. Combined with the Analyze API it made the Herculean task of data collection infinitely easier and faster. I was able to process the text of over 250,000 blog posts in only a month!

Matt Burton University of Michigan

CUSTOM DATA EXTRACTION

Extract Anything

Manually extract any type of data using the Custom API Toolkit.

SoundGecko users want to listen to everything, and Diffbot's Custom API Toolkit helps make anything extractable, whether it's an article or not. Our forthcoming app Aloud will leverage Diffbot's custom API flexibility and responsiveness even more to meet the growing needs of our users.

Ed Hooper SoundGecko

Powered by Diffbot

  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  

Adobe

www.adobe.com Article API Custom API

eBay

www.ebay.com Product API Bulk API

Reverb

www.helloreverb.com Article API Analyze API

Zola

www.zola.com Product API

Instapaper

www.instapaper.com Article API

Cisco

www.cisco.com Crawlbot Custom API Discussion API

FindTheBest

www.findthebest.com Article API Bulk API Crawlbot Custom API

Onswipe

www.onswipe.com Article API Custom API