Feb 12 2014, 2:57pm CST | by Forbes
By Ben Lorica
As a user who tends to mix-and-match many different tools , not having to deal with configuring and assembling a suite of tools is a big win. So I’m really liking the recent trend towards more integrated and packaged solutions. A recent example is the relaunch of Cloudera’s Enterprise Data hub , to include Spark(1) and Spark Streaming. Users benefit by gaining automatic access to analytic engines that come with Spark (2). Besides simplifying things for data scientists and data engineers, easy access to analytic engines is critical for streamlining the creation of big data applications .
Another recent example is Dendrite (3) – an interesting new graph analysis solution from Lab41 . It combines Titan (a distributed graph database), GraphLab (for graph analytics), and a front-end that leverages AngularJS, into a Graph exploration and analysis tool for business analysts:
Users of Spark explore Spark Streaming because similar code for batch (Spark) can, with minor modification, be used for realtime (Spark Streaming) computations. Along these lines, Summingbird – an open source library from Twitter – offers something similar for Hadoop MapReduce and Storm. With Summingbird, programs that look like Scala collection transformations can be executed in batch (Scalding) or realtime (Storm).
In some instances the underlying techniques from a set of tools makes its way into others. The DeepDive team at Stanford just recently revamped their information extraction and natural language understanding system. But already techniques used in DeepDive have found their way into many other systems including MADlib , Cloudera Impala , “a product from Oracle,” and Google Brain .
(1) Full disclosure: I am an advisor to Databricks
– a startup commercializing Spark.
(2) Some potential applications of Spark and Spark Streaming include stream processing and mining, interactive and iterative computing, machine-learning, and graph analytics.
(3) Hat tip to Danny Bickson.
This post originally appeared on O’Reilly Data (“Big Data solutions through the combination of tools ”). It’s republished with permission.
Forbes is among the most trusted resources for the world's business and investment leaders, providing them the uncompromising commentary, concise analysis, relevant tools and real-time reporting they need to succeed at work, profit from investing and have fun with the rewards of winning.
blog comments powered by Disqus
News | Computing | Technology | Google | Technology News | Parallel computing | Cloud computing | Cloud infrastructure | Concurrent computing | Centralized computing | Apache Hadoop | Cloudera | MapReduce | GraphLab | SPARK | Graph database