Automatic APIs

Automatic data extraction from articles, products, discussions and more. Diffbot uses advanced AI technology to retrieve clean, structured data without need for manual rules or site-specific training.

Compare Diffbot to the competition

Meet the APIs


Analyze determines the "page-type" of any submitted URL and returns structured data from supported page-types. While crawling, Analyze will automatically detect product, article and other pages on-the-fly, allowing you to crawl without rules or regexes.

Learn more about the Analyze API

The Article API is used to extract clean article text and related data from news articles and blog posts. Retrieve complete text, normalized HTML, related images and videos, author, date, tags—all automatically, from any article on any site.

Learn more about the Article API

The Discussion API automatically structures and extracts entire threads or lists of reviews/comments from most discussion pages.

Learn more about the Discussion API

The Image API identifies the primary image(s) of a submitted web page and returns comprehensive image information.

Learn more about the Image API

The Product API extracts complete data from any shopping or e-commerce product page—automatically. Retrieve full pricing information, product IDs (SKU, UPC, MPN), images, product specifications, brand and more.

Learn more about the Product API

The Video API (beta) automatically extracts detailed video information, including thumbnail, URL and embed code, from nearly any video platform on the web.

Learn more about the Video API