Plans Technology Documentation Blog

Diffbot API Documentation

HomeFrontpage API

Frontpage API

Demo

URL

Request

To use the Frontpage API, perform a HTTP GET request on the following endpoint:

http://www.diffbot.com/api/frontpage?token=...&url=...

Provide the following arguments:

ArgumentDescription
tokenDeveloper token
urlURL to extract article from
+ Optional arguments
formatFormat the response output in xml (default) or json

Alternatively, you can POST the content to analyze directly to the same endpoint. Specify the Content-Type header as either text/plain or text/html.

Response

DML Response

DML (Diffbot Markup Language) is an XML format for encoding the extracted structural information from the page. A DML consists of a single info section and a list of items.

Fields found in the Info section

Info fieldTypeDescription
idlongDMLID of the URL
titlestringExtracted title of the page
sourceURLurlthe URL this was extracted from (this is the URL given in the Add Request
iconurlA link to a small icon/favicon representing the page
numItemsintThe number of items in this DML document

Some of the fields found in Items

Item fieldTypeDescription
idlongUnique hashcode/id of item
titlestringTitle of item
descriptionstringinnerHTML content of item
xrootxpathXPATH of where item was found on the page
pubDatetimestampTimestamp when item was detected on page
linkURLExtracted permalink (if applicable) of item
type{IMAGE,LINK,STORY,CHUNK}Extracted type of the item, whether the item represents a image, permalink, story(image+summary), or html chunk.
imgURLExtracted image from item
textSummarystringA plain-text summary of the item
spdouble<-[0,1]Spam score - the probability that the change is spam/ad
srdouble<-[1,5]Static rank - the quality score of the item on a 1 to 5 scale
freshdouble<-[0,1]Fresh score - the percentage of the item that has changed compared to the previous crawl

analyzing page...

No article was found at that URL.

sample of attributes you can get using our frontpage api
company     press     support     privacy      terms