case study

How we helped a hot AI Startup build their own Knowledge Graph

Using data enriched from the Diffbot Knowledge Graph, Relational AI landed a project with one of the largest online retailers and published a research paper at a top-tier academic conference. Hear from Molham Aref, CEO of Relational AI, about how Diffbot helped their team augment their own product knowledge graph dataset with data from the entire public web.

There was only so much internal data they had on their products. But by being able to mine the internet using Diffbot for each product, we were able to develop a much nicer, much cleaner, a more complete, and more accurate product knowledge graph to drive search, recommendations, and many other customer-facing functions in their company.

- Molham Aref, CEO, RelationalAI

Relational AI

About RelationalAI

RelationalAI is a AI startup working at the cutting edge and the intersection of databases, knowledge graphs, and artificial intelligence. Specifically, they are developing the first Relational Knowledge Graph Management System–a new kind of relational reasoning engine that allows for inferential models to run efficiently on rich, interconnected data. Instead of the standard approach of building a model from data extracted from a database, RelationalAI’s models run directly on top of the relational data structure, leading to orders-of-magnitude speedup in inference without the typical ETL (extract-transform-load) loop and data copying bottlenecks.

RelationalAI’s team consistent of world-experts from the foremost universities in the areas of machine learning, database systems, and programming language research. They are also just really friendly and nice human beings. So, when RelationalAI approached us about a project they were doing, we were excited to help them stand up a Product Knowledge Graph.

The Problem

RelationalAI was engaged in a project with one of the largest online retailors to improve the retailor’s ability to make product recommendations on their site. Led by VP of Research Nikolaos Vasiloglou, Relational took on a project to test the ability of their system to run relational reasoning over their product catalog. However, they faced one big problem: the retailor’s product data was sparse, lacking many of the product attributes needed to make solid predictions.

The original dataset from the retailor contained 206 products (all washers), 266 distinct product attributes, and 34k facts (or values for each product-attribute pairs).

Nikolaos realized that more detailed information about each product existed online–in the form of third-party product listings, product manuals, product reviews, blogs, on social media, etc. The only problem was that all of this product information on the web exists in disparate forms, on an uncountable number of sites. It would take an infeasible amount of work to create manual web scrapers and rules for each of these sites to collect this information.

Example of product attributes for a Washer product:

The Solution

RelationalAI employed Diffbot’s automatic Product extraction technology and Knowledge Graph in order to automatically augment the existing product data provided by the retailer with relevant information extracted from the public web. Given just the sparse information about the initial 206 washer products, Diffbot’s AI was able to automatically read pages on the web about the washers and extract information such as the brand, price, color, capacity and thousands of other product specs.

The deployed state-of-the-art system that RelationalAI designed for product inference, including the Diffbot component, looked like this:

More detail on the RelationalAI project, including details of their novel human-in-the-loop approach to probabilistic soft logic, can be found in their published research paper at KDD2020.

Results

Using Diffbot for their data acquisition and augmentation allowed the RelationalAI to focus their research team’s efforts on the unique modeling and reasonining aspect of the system, ultimately helping them accelerate the completion and delivery of their project.

Using Diffbot enabled RelationalAI data scientists to have access to much more complete and rich product data, leading to an accepted research paper at the Knowledge Discovery and Data Mining Conference (KDD) 2020. Read the paper here: https://usc-isi-i2.github.io/KDD2020workshop/papers/KGE1_paper_12.pdf

Using Diffbot enabled RelationalAI to build a product knowledge inference system that produced a cumulative precision and recall of 99.6% and 41.38%, respectively.