Big Data Solutions Through The Combination Of Tools

Posted: Feb 12 2014, 2:57pm CST | by , Updated: Feb 12 2014, 3:00pm CST, in News | Technology News

 

Big Data Solutions Through The Combination Of Tools
/* Story Top Left 2010 300x250, created 7/15/10 */ google_ad_slot = "8340327155";
 

By Ben Lorica

As a user who tends to mix-and-match many different tools, not having to deal with configuring and assembling a suite of tools is a big win. So I’m really liking the recent trend towards more integrated and packaged solutions. A recent example is the relaunch of Cloudera’s Enterprise Data hub, to include Spark(1) and Spark Streaming. Users benefit by gaining automatic access to analytic engines that come with Spark(2). Besides simplifying things for data scientists and data engineers, easy access to analytic engines is critical for streamlining the creation of big data applications.

Another recent example is Dendrite(3) – an interesting new graph analysis solution from Lab41. It combines Titan (a distributed graph database), GraphLab (for graph analytics), and a front-end that leverages AngularJS, into a Graph exploration and analysis tool for business analysts:

Users of Spark explore Spark Streaming because similar code for batch (Spark) can, with minor modification, be used for realtime (Spark Streaming) computations. Along these lines, Summingbird – an open source library from Twitter – offers something similar for Hadoop MapReduce and Storm. With Summingbird, programs that look like Scala collection transformations can be executed in batch (Scalding) or realtime (Storm).

In some instances the underlying techniques from a set of tools makes its way into others. The DeepDive team at Stanford just recently revamped their information extraction and natural language understanding system. But already techniques used in DeepDive have found their way into many other systems including MADlib, Cloudera Impala, “a product from Oracle,” and Google Brain.

Related content:


(1) Full disclosure: I am an advisor to Databricks – a startup commercializing Spark.
(2) Some potential applications of Spark and Spark Streaming include stream processing and mining, interactive and iterative computing, machine-learning, and graph analytics.
(3) Hat tip to Danny Bickson.

This post originally appeared on O’Reilly Data (“Big Data solutions through the combination of tools”). It’s republished with permission.

Source: Forbes

You May Like

Advertisement

The Author


Forbes is among the most trusted resources for the world's business and investment leaders, providing them the uncompromising commentary, concise analysis, relevant tools and real-time reporting they need to succeed at work, profit from investing and have fun with the rewards of winning.

 

 

Advertisement

Leave a Comment

Share this Story

Follow Us
Follow I4U News on Twitter
Follow I4U News on Facebook

Advertisement

More For You

Read the Latest from I4U News

Tags:

data | big | solutions