our APIs

use computer vision, machine learning and natural language processing to help developers extract and understand objects from any Web page.

An API for the Web

We've determined that the entire Web can be classified into approximately 18 structural page types. From this basic understanding of common page layouts, Diffbot then uses computer vision, natural language processing and other machine learning algorithms to identify and extract the important items from within these pages.



We utilize state-of-the art computer vision and NLP algorithms; have the largest collection of tagged pages and update our model several times per week.


Pass in a URL and we'll do the rest. Stop spending time building custom scrapers and -- even worse -- maintaining them.


Diffbot is built and run by Web veterans in a multi-tiered environment with redundancy, monitoring and scalability built-in. Our scale lets us operate the service more cheaply than running it yourself.


We use open standards (schema.org) and allow for endless configurability via our customization tool.

Core and Custom Solutions

On-Demand APIs

The Article API is used to extract clean article text from news article web pages. The Follow API allows you to subscribe to the changes of any web page. The Frontpage API takes in a multifaceted “homepage” and returns individual page elements. [Limited Alpha] The Page Classifier API takes any web link and automatically determines what type of page it is. Our Custom API Toolkit allows you to extract content using CSS or Xpath selectors from any page on the Web.  

Custom Solutions

Our powerful machine learning and computer vision systems provide solutions to many complex, custom problems.

Contact us about your specific needs.