Latest News: Technology |  Celebrity |  Movies |  Apple |  Cars |  Business |  Sports |  TV Shows |  Geek


Filed under: News


Big Data Variety Means That Metadata Matters

Dec 31 2013, 2:37pm CST | by


Of the famous big data Vs, it’s the variety in data that holds the most potential for exploitation. While not everybody has the huge problems of volume and velocity that a Facebook or a high frequency trader has, even the smallest business has multiple data sources they can benefit from combining. Straightforward access to a broad variety of data is a key part of a platform for driving innovation and efficiency.

One common response from businesspeople to the term “big data” is to think that they simply don’t have that problem—but this is to ignore the variety of data. The notion of variety in data encompasses the idea of using multiple sources of data to help understand a problem. It’s forgivable to overlook this potential: as an abstract concept, it’s harder to grasp than “bigger” or “faster”.

Notwithstanding the difficulties inherent in grasping the concept, the ability of an additional data set to shed light on observed phenomena is profound. Consider, for instance, the addition of weather, geographical and social media data to the daily sales figures for a retail chain. It is easy to conceive that correlations with peaks and troughs in sales could be elicited: perhaps with good weather, word-of-mouth trends or road accessibility. With sufficient data, some of these events might even be found to be predictive of sales.

While identifying such trends may seem well-worn examples in today’s marketing-driven environment, the reality of taking advantage of such variety is less straightforward. In general, data systems are geared up to expect clean, tabular data of the sort that flows into relational database systems and data warehouses.

Handling diverse and messy data requires a lot of cleanup and preparation. Four years into the era of data scientists, most practitioners report that their primary occupation is still obtaining and cleaning data sets. This forms 80% of the work required before the much-publicized investigational skill of the data scientist can be put to use.

Understanding and identifying

Even to focus on the problems of cleaning data is to ignore the primary problem, however. A chief obstacle for many business and research endeavors is simply locating, identifying and understanding data sources in the first place, either internal or external to an organization.

This is complicated not only in a technical sense, but often in a political, legal or logistical way.

While the data scientist may not be able to directly bring about organizational or political change, they are able to materially affect the accessibility and comprehensibility of data. It’s not glamorous, but it’s powerful: they must document and describe their data. The documentation and description of datasets with metadata—data about data—enhances the discoverability and usability of data both for current and future applications, as well as forming a platform for the vital function of tracking data provenance.

Though “metadata” has long and somewhat unfairly been held as a slightly dull topic, of interest primarily to librarians, the news coverage of national security agencies tracking metadata has served to educate the public of metadata’s importance in understanding and exploiting information and behavior. Metadata within data infrastructures enables us to locate and combine data, and to analyze its lifecycle and history.

In the last two decades, the problem of data description and discovery has been tackled to an extent within the data warehousing world, but is an expensive approach to deploy even within an organization, regardless of its ability to scale across multiple data platforms or the web.

In the same way we looked to the web for big data technologies, there are seeds of possible routes forward for data description and discovery out on the web. While we hear much trumpeting of open data for government, today’s enterprises are much further behind. It’s hard for employees to share data with each other, never mind third parties. Perhaps open data technology can be brought inside organizations. The work of the Open Knowledge Foundation in promoting public open data has led to technical developments in data sharing, many of which have intra-organizational potential as well as for their intended purpose. Notable among these is CKAN, an open source data portal platform.


The output of one step of data processing necessarily becomes the input of the next. To process data and exploit only the result of the calculation is short-sighted. The practices and tools of big data and data science do not stand alone in the data ecosystem. They rely on the usability of data, and we will all gain from ensuring that our results are able to form a platform for future discovery and innovation. As big data tools grow in maturity and adoption over 2014, we will see the rising importance of the need to support this kind of exchange and collaboration around enterprise data.

Source: Forbes

You Might Also Like


Shopping Deals


<a href="/latest_stories/all/all/31" rel="author">Forbes</a>
Forbes is among the most trusted resources for the world's business and investment leaders, providing them the uncompromising commentary, concise analysis, relevant tools and real-time reporting they need to succeed at work, profit from investing and have fun with the rewards of winning.




blog comments powered by Disqus

Latest stories

Houston Astros Fire Bo Porter
Houston Astros Fire Bo Porter
The Houston Astros fired manager Bo Porter and bench coach Dave Trembley after the team posted a 59-79 record through Sept. 1. Adam Everett will replace Trembley while the search for the Astros' new manager begins immediately.
Tinder App Review
Tinder App Review
Tinder is a great mobile app for those who love making new friends. It's a fun way to make interesting people your friends. Just log in, swipe right to like or left to pass, and if the person likes you back, “It's a Match” for you. Making friends is a lot fun now if you are on Tinder.
SketchBook Pro App Review
SketchBook Pro App Review
Sketchbook Pro, as the name suggests, is a professional-grade design and illustration app. The app is appraised to be one of the best mobile apps for design, sketching, painting and illustration. This app is predominantly used by on-the-go artists, designers and illustrators. This app is also used by teens who are interested in arts and by children to practice drawing and painting in a way that is much more interactive than regular painting book activities. So on the whole, SketchBook Pro is a very productive mobile application that targets a heterogeneous audience.
Trolls vs Vikings Game Review
Trolls vs Vikings Game Review
The Trolls and Zombie has become a very popular game recently as nearly half a million people have downloaded this game since its launch. People of all ages love to play this game as it has got something for everybody.

About the Geek Mind

The “geek mind” is concerned with more than just the latest iPhone rumors, or which company will win the gaming console wars. I4U is concerned with more than just the latest photo shoot or other celebrity gossip.

The “geek mind” is concerned with life, in all its different forms and facets. The geek mind wants to know about societal and financial issues, both abroad and at home. If a Fortune 500 decides to raise their minimum wage, or any high priority news, the geek mind wants to know. The geek mind wants to know the top teams in the National Football League, or who’s likely to win the NBA Finals this coming year. The geek mind wants to know who the hottest new models are, or whether the newest blockbuster movie is worth seeing. The geek mind wants to know. The geek mind wants—needs—knowledge.

Read more about The Geek Mind.