Plans Technology Showcase Documentation Blog
by mike on Jun 29, 2011

On June 25th, over 80 of Silicon Valley's top hackers, designers, and students gathered at the Diffbot offices in Palo Alto in the hopes of building the next great Web 3.0 app. Over the next 13 hours, participants learned about data and analysis APIs, formed teams, and wrote code.

At the end of the night, only one team could win the prize for best App.

Best App Award: HeatSync

HeatSync solves a simple problem: knowing where people are nearby. HeatSync aggregates check-in data from FourSquare, Gowalla, Google, and Twitter into an interactive heatmap centered near you (for the demo, only FourSquare data was used). The team displayed a great combination of backend talent, user experience, and product design.

Video: link Page: link

People's Choice Award: My Moods

The winner of the popular vote was My Moods, an app that helps you change your emotional state by reading news. My Moods use the Diffbot API to extract articles from the web and the EffectCheck service to perform sentiment analysis.

Demo: link

See the full list of hacks here: http://webmininghackday.uservoice.com

Thanks to all of our sponsors for making this event possible!

(0 Comments)
by mike on Jun 13, 2011

In a recent benchmark, Diffbot placed first overall among text extraction APIs on an academic evaluation set and one sampled from Google News.

Tomaz Kovacic, a university student in artificial intelligence, recently conducted a comprehensive benchmark of text extraction methods as part of his thesis. Included in the study are commercial vendors as well as open-source APIs for text extraction. He did an excellent job in designing the study, measuring both precision, recall, F1, as well as careful error case analysis.


Image credit: Tomaz Kovacic

The CleanEval dataset, developed at the Association of Computational Linguistics conference, is a widely used evaluation in academia, and the Google News article dataset was sampled from the 5000+ news sources that Google aggregates.

Diffbot's method relies on training a core set of visual features (such as geometrical, stylistic, and render properties) to recognize different types of documents. In this case, we had trained Diffbot on a set of news article typed pages to recognize certain parts of news pages. In addition to article text, Diffbot's article API returns the content author, date, location, article images, article videos, favicon, and even topics (support in English and other languages coming soon). Besides article pages, Diffbot's core features have been trained to extract information from other types of pages too (such as frontpages).

This result gives us great promise that generalized vision-based machine learning techniques can perform just as well, if not better, than approaches engineered for specific tasks.

Learn more details about the study.

(0 Comments)
by mike on Apr 27, 2011

A little over a month ago, we released free access to our technology via a set open APIs, featured on the front page of Hacker News. Our goal is to keep access to our technology free and provide the tools that allow developers to build awesome apps. So far, 134 developers have received developer tokens, submitting a huge variety of ideas.

Over the past month, some of these apps are getting quite close to launch. We've created a showcase section of the site to highlight some of the amazing apps that are currently being developed using Diffbot.

Here's our attempt to try to roughly categorize the types of applications people are trying to build using the Diffbot APIs:

Content readers40%
Syndication and alerting20%
Vertical aggregators and search10%
Building corpuses (e.g. NLP/IR research)5%
OTHER25%

Notice that there is a huge fat tail (25%) that is "OTHER". Many of these applications are truly unique and defy categorization! We'll keep you updated as these apps make progress.

As always, give us feedback on how we can improve the APIs to make your apps better. We've already added substantial functionality to the article APIs in response to early feedback.

To get started, apply for a Diffbot developer token.

(0 Comments)
login     company     press     support     privacy      terms