Automatic APIs

Automatic data extraction from articles, products, discussions and more. Diffbot uses advanced AI technology to retrieve clean, structured data without need for manual rules or site-specific training.

Analyze API

What's behind that URL? The Analyze API automatically determines the "page type" of unknown URLs, and returns complete structured data for supported page types (articles, products, images, discussions or videos).

When paired with Crawlbot, Analyze detects products, articles or other pages on the fly—so you can crawl and extract entire sites without a single manual rule.

Test Drive the Analyze API
Test Drive Please enter a URL to test.

Features

"Looks" Like a Human

Analyze evaluates web pages like a human being does: by looking at it in a regular web browser. Just as you can tell a shopping page from a news article, so can Diffbot.

One-Click Crawling

Use the Analyze API when crawling and all you need is a seed URL. Then let Diffbot find all of the products—or articles, or discussion threads, or videos or images—across the entire site.

Routes to All Automatic APIs

When the Analyze API identifies a matching page-type, it automatically extracts page data using the appropriate extraction API—no extra API call necessary!

Article API

The Article API automatically extracts clean text from news articles and blog posts—returning normalized HTML and plaintext, author and date information, related images/videos and more from any article on any site.

Test Drive the Article API
Test Drive Please enter a URL to test.

Features

Simply the Best

Diffbot's Article API has been the overwhelming winner in quality shootouts since anyone thought to start testing such things (in 2011). Compare text-extraction methods.

Fully Automatic

Like all of Diffbot's Automatic APIs, the Article API needs no rules or training. Send it any text-heavy page and let Diffbot do the rest.

Speaks Any Language

Thanks to its basis in computer vision, the Article API extracts clean text in any language.

Native Text Analysis

Topics/tags are automatically generated for each analyzed article, and built-in sentiment analysis automatically scores an article's (and all of its comments') positivity/negativity.

Comments, Too

Diffbot's Discussion API technology is built-in to the Article API to automatically extract comments alongside main article text.

Crawl 'Em All

Pair the Article API with Crawlbot to automatically identify and extract all articles across an entire site.

Discussion API

Diffbot's Discussion API structures the full content of forum threads, article comments, product reviews and more.

Test Drive the Discussion API
Test Drive Please enter a URL to test.

Features

Fully Automatic

Like all of Diffbot's Automatic APIs, the Discussion API needs no rules or training. Send it any page containing a discussion and let Diffbot do the rest.

Text Analysis

Topics/tags are automatically generated for each post, and built-in sentiment analysis automatically analyzes each individual post to rate its overall positivity/negativity.

Get All the Pages

Long forum thread spanning multiple pages? No problem. Use the &maxPages argument to automatically concatenate as many pages as you need.

Comments & Reviews

Discussion API technology is built-in to the Article and Product APIs to automatically extract comments and reviews from article and product pages.

Crawl 'Em All

Pair the Discussion API with Crawlbot to automatically identify and extract all discussion threads across an entire site.

Image API

Diffbot's Image API extracts and analyzes individual images and image-heavy pages.

Test Drive the Image API
Test Drive Please enter a URL to test.

Features

Fully Automatic

Like all of Diffbot's Automatic APIs, the Image API needs no rules or training. Send it any image-heavy page and let Diffbot do the rest.

Intra-image Analysis

The Image API automatically evaluates image content and generates tags based on its identified elements.

Image Mentions

Use the mentions field to see where else on the web an image (or its variants) has been seen.

Product API

The Product API extracts complete data from any shopping or e-commerce product page. Retrieve full pricing information, product IDs (SKU, UPC, MPN), images, product specifications, brand and more.

Test Drive the Product API
Test Drive Please enter a URL to test.

Features

Fully Automatic

Like all of our Automatic APIs, the Product API needs no rules or training. Send it any product page and let Diffbot do the rest.

Complete Pricing Data

Retrieve all price data available: original price, sale/offer price, shipping cost and discount amount. If a product comes in price ranges or with quantity-based discounts, the Product API can return that too.

Normalized Specs

The Product API automatically identifies and extracts any specifications tables (or table-like elements).

Review Extraction

Diffbot's Discussion API technology is built-in to the Product API to automatically extract reviews from most product pages.

Country-Specific Pricing

Subscribers to our Plus or Professional plans can optionally upgrade for access to region- or country-specific proxy IP addresses.

Crawl 'Em All

Pair the Product API with Crawlbot to automatically identify and extract all products across an entire shopping site.

Video API

Diffbot's Video API extracts detailed information from video-specific pages.

Test Drive the Video API
Test Drive Please enter a URL to test.

Features

Fully Automatic

Like all Diffbot's Automatic APIs, the Video API works right out of the box, with no need for rules or training.

Get the Raw Bits

Where possible Diffbot extracts the raw source content in addition to embeddable HTML.

Video Metadata

Author/uploader, duration, title, description, date uploaded, video views... Diffbot's visual page analysis returns everything we see on the page.