Article API
The Article API automatically extracts clean text from news articles and blog posts—returning normalized HTML and plaintext, author and date information, related images/videos and more from any article on any site.
Features
Simply the Best |
Diffbot's Article API has been the overwhelming winner in quality shootouts since anyone thought to start testing such things (in 2011). Compare text-extraction methods. |
Fully Automatic |
Like all of Diffbot's Automatic APIs, the Article API needs no rules or training. Send it any text-heavy page and let Diffbot do the rest. |
Speaks Any Language |
Thanks to its basis in computer vision, the Article API extracts clean text in any language. |
Native Text Analysis |
Topics/tags are automatically generated for each analyzed article, and built-in sentiment analysis automatically scores an article's (and all of its comments') positivity/negativity. |
Comments, Too |
Diffbot's Discussion API technology is built-in to the Article API to automatically extract comments alongside main article text. |
Crawl 'Em All |
Pair the Article API with Crawlbot to automatically identify and extract all articles across an entire site. |