Image

The Image API identifies the primary image(s) of a submitted web page and returns comprehensive information and metadata for each image.

Test drive Image API without a trial token at diffbot.com/testdrive.

Response

The Image API returns data in JSON format.

Each response includes a request object (which returns request-specific metadata), and an objects array, which will include the extracted information for all objects on a submitted page.

Objects in the Image API's objects array will include the following fields:

Field	Description
`type`	Type of object (always `image`).
`url`	Direct link to image file.
`title`	Title or caption of the image, if available.
`naturalHeight`	Raw image height, in pixels.
`naturalWidth`	Raw image width, in pixels.
`humanLanguage`	Returns the (spoken/human) language of the submitted page, using two-letter ISO 639-1 nomenclature..
`anchorUrl`	If the image is hyperlinked, returns the destination URL.
`pageUrl`	URL of submitted page / page from which the image is extracted.
`resolvedPageUrl`	Returned if the `pageUrl` redirects to another URL.
`xpath`	XPath expression identifying the image node.
`diffbotUri`	Unique object ID. The `diffbotUri` is generated from the values of various Image fields and uniquely identifies the object. This can be used for deduplication.
Optional fields, available using `fields=` argument
`displayHeight`	Height of image as presented in the browser (and as sized via browser/CSS, if resized).
`displayWidth`	Width of image as presented in the browser (and as sized via browser/CSS, if resized).
`links`	Returns a top-level object (`links`) containing all hyperlinks found on the page.
`meta`	Comma-separated list of image-embedded metadata (e.g., EXIF, XMP, ICC Profile), if available within the image file.
`querystring`	Returns any key/value pairs present in the URL querystring. Items without a discrete value will be returned as `true`.
`breadcrumb`	Returns a top-level array (`breadcrumb`) of URLs and link text from page breadcrumbs.

The following is an example response from a successful extraction of a product page on diffbot.com.

{
  "request": {
    "pageUrl": "https://www.diffbot.com/products/extract/",
    "api": "image",
    "version": 3
  },
  "objects": [
    {
      "xpath": "/HTML/BODY/MAIN/DIV[@id='slice-readslikehumans']/DIV[@class='container px-3 py-5 py-md-6 mx-auto']/DIV[@class='row justify-content-center']/DIV[@class='col-12  col-md  d-flex justify-content-center align-items-center  order-md-1']/IMG[@class='img-fluid mb-4 align-self-start']",
      "humanLanguage": "en",
      "naturalHeight": 1023,
      "diffbotUri": "image|3|666824882",
      "pageUrl": "https://www.diffbot.com/products/extract/",
      "type": "image",
      "url": "https://www.diffbot.com/assets/img/products/extract_screenshot.png",
      "naturalWidth": 897,
      "tags": [
        {
          "typeHierarchy": [
            "http://www.w3.org/2002/07/owl#Thing",
            "http://dbpedia.org/ontology/Work",
            "http://dbpedia.org/ontology/Website"
          ],
          "id": 33898,
          "label": "Website",
          "type": "http://dbpedia.org/ontology/Website",
          "uri": "https://www.diffbot.com/entity/Xd90vp_U4MJOoRHIja3quxg"
        },
        {
          "id": 1404579,
          "label": "Ring binder",
          "uri": "http://diffbot.com/entity/XYwPij6UgPdaDEY6fXSrrRA"
        }
      ]
    },
    {
      "xpath": "/HTML/BODY/MAIN/DIV[@id='slice-lookslikeahuman']/DIV[@class='container px-3 py-5 py-md-6 mx-auto']/DIV[@class='row justify-content-center']/DIV[@class='col-12  col-md  d-flex justify-content-center align-items-center  order-md-1']/IMG[@class='img-fluid mb-4']",
      "humanLanguage": "en",
      "naturalHeight": 270,
      "diffbotUri": "image|3|1865428953",
      "pageUrl": "https://www.diffbot.com/products/extract/",
      "type": "image",
      "url": "https://www.diffbot.com/assets/img/products/any_language.png",
      "naturalWidth": 554,
      "tags": [
        {
          "id": 5462349,
          "label": "Sachet",
          "uri": "http://diffbot.com/entity/X1_7MIKBoPpuRnAPVQXqQxA"
        },
        {
          "id": 479373,
          "label": "Eraser",
          "uri": "http://diffbot.com/entity/XGPnWNBUZPyuNyfK9rJr6vQ"
        },
        {
          "id": 57260,
          "label": "Envelope",
          "uri": "http://diffbot.com/entity/XK5gEV93iP6SqKMDjDr0YBQ"
        }
      ]
    },
    {
      "xpath": "/HTML/BODY/MAIN/DIV[@id='slice-one-click-crawling']/DIV[@class='container px-3 py-5 py-md-6 mx-auto']/DIV[@class='row justify-content-center']/DIV[@class='col-12  offset-md-1 col-md  d-flex justify-content-center align-items-center  order-md-2']/IMG[@class='img-fluid mb-4']",
      "humanLanguage": "en",
      "naturalHeight": 372,
      "diffbotUri": "image|3|1699007329",
      "pageUrl": "https://www.diffbot.com/products/extract/",
      "type": "image",
      "url": "https://www.diffbot.com/assets/img/products/analyze_two.png",
      "naturalWidth": 451,
      "tags": [
        {
          "id": 268267,
          "label": "Tray",
          "uri": "http://diffbot.com/entity/XwXgQ3A7sNmypX2VX3oZCgg"
        },
        {
          "id": 467731,
          "label": "Spatula",
          "uri": "http://diffbot.com/entity/XZohZtDoDPvq1wusJQCSZZA"
        },
        {
          "id": 2649730,
          "label": "Measuring cup",
          "uri": "http://diffbot.com/entity/XhwdQ4HnKOsCVrf9YU3hyWw"
        }
      ]
    }
  ]
}

Optional Fields

Specify each field desired (comma delimited) in the &fields= argument. In addition to the fields listed below, there are also more fields available with all Extract APIs .

Field	Description
`displayHeight`	Height of image as presented in the browser (and as sized via browser/CSS, if resized).
`displayWidth`	Width of image as presented in the browser (and as sized via browser/CSS, if resized).

Already have the source HTML? POST it to Image API.

Image API supports a POST option that allows you to upload HTML or plain text for extraction. See Extract Content Not Available Online.