Information Wants to Be Structured

Using AI, computer vision, machine learning and natural language processing, Diffbot provides developers numerous tools to understand and extract from any web page.

Our core products:

Automatic APIs
Extract content from standard page types: articles/blog posts, front pages, image and product pages
Conventional web pages
   "name": "Automatic APIs",
   "type": "computer vision",
   "author": "Diffy",
   "target": "common web pages"
Custom API Toolkit
Create custom rules to extract data from any web page
Any and all web pages
   "name": "Custom API Toolkit",
   "type": "custom extraction",
   "author": "Diffy",
   "target": "any kind of page"
Extract an entire site using Automatic or Custom APIs
Entire domain
   "name": "Crawlbot",
   "type": "spidering",
   "author": "Diffy",
   "target": "entire domains"