Article API

The Article API automatically extracts clean text from news articles and blog posts—returning normalized HTML and plaintext, author and date information, related images/videos and more from any article on any site.

Sign up for a Diffbot plan Article Documentation


Simply the Best

Diffbot's Article API has been the overwhelming winner in quality shootouts since anyone thought to start testing such things (in 2011). Compare text-extraction methods.

Fully Automatic

Like all of Diffbot's Automatic APIs, the Article API needs no rules or training. Send it any text-heavy page and let Diffbot do the rest.

Speaks Any Language

Thanks to its basis in computer vision, the Article API extracts clean text in any language.

Native Text Analysis

Topics/tags are automatically generated for each analyzed article, and built-in sentiment analysis automatically scores an article's (and all of its comments') positivity/negativity.

Comments, Too

Diffbot's Discussion API technology is built-in to the Article API to automatically extract comments alongside main article text.

Crawl 'Em All

Pair the Article API with Crawlbot to automatically identify and extract all articles across an entire site.