Automatic data extraction from articles, products, discussions and more. Diffbot uses advanced AI technology to retrieve clean, structured data without need for manual rules or site-specific training.
Analyze determines the "page-type" of any submitted URL and returns structured data from supported page-types. While crawling, Analyze will automatically detect product, article and other pages on-the-fly, allowing you to crawl without rules or regexes.
The Article API is used to extract clean article text and related data from news articles and blog posts. Retrieve complete text, normalized HTML, related images and videos, author, date, tags—all automatically, from any article on any site.
The Discussion API automatically structures and extracts entire threads or lists of reviews/comments from most discussion pages.
The Image API identifies the primary image(s) of a submitted web page and returns comprehensive image information.
The Product API extracts complete data from any shopping or e-commerce product page—automatically. Retrieve full pricing information, product IDs (SKU, UPC, MPN), images, product specifications, brand and more.
The Video API (beta) automatically extracts detailed video information, including thumbnail, URL and embed code, from nearly any video platform on the web.