Latest News: Technology |  Celebrity |  Movies |  Apple |  Cars |  Business |  Sports |  TV Shows |  Geek


Filed under: News


The Data Lake Dream

The Data Lake Dream
Photo Credit: Forbes

How will enterprise data architecture evolve over the next five years?

In 2013, I spent a lot of time talking about Hadoop’s development towards being a central destination for data. Hadoop may enter an organization for a specific use case, but data attracts data. Once in the door, Hadoop tends to become a center of gravity. This effect is amplified by the appeal of big data being not just about the data size, but the agility it brings to an organization.

However, to exist feasibly in this way, Hadoop needs more than just a data crunching engine and a small army of willing Java programmers. It must become an enterprise platform that supports application development. By the end of the 2013, the major Hadoop vendors had all formulated a platform strategy: be it the Cloudera Enterprise Data Hub, or Hortonworks Data Platform.

One phrase in particular has become popular for describing the massing of data into Hadoop, the “Data Lake”, and indeed, this term has been adopted by Pivotal for their enterprise big data strategy.

But what do the big data vendors mean by this?

The data lake dream is of a place with data-centered architecture, where silos are minimized, and processing happens with little friction in a scalable, distributed environment. Applications are no longer islands, and exist within the data cloud, taking advantage of high bandwidth access to data and scalable computing resource. Data itself is no longer restrained by initial schema decisions, and can be exploited more freely by the enterprise.

I call it a dream, because we’ve a way to go to make the vision come true. It is, however, an accessible dream.

I’ve set out to describe the four levels of Hadoop maturity that lead us to the dream of the data lake. From these levels we can see where today’s Hadoop vendors are, and understand where our own organizations sit.

Four Levels Of Data Lake Maturity

(1) Life Before Hadoop

  • Applications stand alone with their databases
  • Some applications contribute data to a data warehouse
  • Analysts run reporting and analytics in data warehouse

(2) Hadoop Is Introduced

  • Applications contribute data to Hadoop
  • Hadoop runs batch MapReduce jobs
  • Hadoop used for ETL into warehouse or analytic databases
  • Hadoop data reintroduced into applications

(3) Growing The Data Lake

  • Newly built systems center around Hadoop by default
  • Applications use each other’s data via Hadoop
  • Interactive use of Hadoop as in-Hadoop databases deployed (e.g. Impala, Greenplum, Spark)
  • Hadoop becomes a default data destination, governance and metadata become important
  • Data warehouse use becomes the exception, where legacy or special requirements dictate
  • External data sources integrated via Hadoop

(4) Data Lake And Application Cloud

  • New applications are built on a Hadoop application platform around the data lake
  • Hadoop matures as an elastic distributed data computing platform, for both operational and analytical functions
  • Data lake adds security and governance layers
  • Data availability increases, application deployment time decreases
  • Some apps still have special or legacy needs and execute independently


A vision is necessarily forward-looking. In reality, many organizations are only just starting to kick the tires of Hadoop. Of those enterprises who are using Hadoop, most are in the early stages of this process in level (2), with a few front-runners living at level (3). Those organizations are big enough to face and invest in solutions to challenges that the vendors haven’t yet stepped up to, such as managing provenance, data discovery and fine-grained security.

Does anybody live the dream fully yet? Arguably, yes, the internal infrastructures developed at Google and Facebook certainly provide their developers with the advantages and agility of the data lake dream.

Big data software vendors themselves are ushering in the early stages of level (3), with the focus for 2014 being on application development. We see new companies such as Continuuity and Pivotal addressing the developer experience for big data.

Regardless of where you are now, take some time to look to the future. We’re on a journey towards connecting enterprise data together. As business is increasingly digital, access to data will become a critical priority, as will speed of development and deployment. The data lake is a dream that can match those demands.

Source: Forbes

iPad Air Giveaway. Win a free iPad Air.

You Might Also Like


Shopping Deals


<a href="/latest_stories/all/all/31" rel="author">Forbes</a>
Forbes is among the most trusted resources for the world's business and investment leaders, providing them the uncompromising commentary, concise analysis, relevant tools and real-time reporting they need to succeed at work, profit from investing and have fun with the rewards of winning.




blog comments powered by Disqus

Latest stories

Ronda Rousey to fight Gina Carano; appears at The Expendables 3 Premiere
Ronda Rousey to fight Gina Carano; appears at The Expendables 3 Premiere
Ronda Rousey plans to fight Gina Carano in her next match in the ring. And Arnie and Sly ogled her at the moment when the premiere of The Expendables 3 was taking place.
Windows Phone 9 Hits Developers for Christmas
Windows Phone 9 Hits Developers for Christmas
Windows 9 is supposed to be unveiled end of September. The new Windows Phone 9 OS is rumored to arrive for developers in December.
Martina Hingis makes a Comeback to the Game
Martina Hingis makes a Comeback to the Game
The tennis champ, Martina Hingis made a comeback to the game recently. She will play in the doubles match for the U.S. Open.
IFA 2014: Samsung Gear 3 Smartwatch has Cellular
IFA 2014: Samsung Gear 3 Smartwatch has Cellular
Samsung is rumored to unveil the new Galaxy Gear 3 smartwatch along the Galaxy Note 4 phablet at the IFA 2014 next month. The new Gear 3 is supposed to have cellular network connectivity.

About the Geek Mind

The “geek mind” is concerned with more than just the latest iPhone rumors, or which company will win the gaming console wars. I4U is concerned with more than just the latest photo shoot or other celebrity gossip.

The “geek mind” is concerned with life, in all its different forms and facets. The geek mind wants to know about societal and financial issues, both abroad and at home. If a Fortune 500 decides to raise their minimum wage, or any high priority news, the geek mind wants to know. The geek mind wants to know the top teams in the National Football League, or who’s likely to win the NBA Finals this coming year. The geek mind wants to know who the hottest new models are, or whether the newest blockbuster movie is worth seeing. The geek mind wants to know. The geek mind wants—needs—knowledge.

Read more about The Geek Mind.