Latest News: Technology |  Celebrity |  Movies |  Apple |  Cars |  Business |  Sports |  TV Shows |  Geek

Trending

Filed under: News

 

Big Data Variety Means That Metadata Matters

Dec 31 2013, 2:37pm CST | by

10 Updates
 
 

Comments

Full Story

Big Data Variety Means That Metadata Matters

Of the famous big data Vs, it’s the variety in data that holds the most potential for exploitation. While not everybody has the huge problems of volume and velocity that a Facebook or a high frequency trader has, even the smallest business has multiple data sources they can benefit from combining. Straightforward access to a broad variety of data is a key part of a platform for driving innovation and efficiency.

One common response from businesspeople to the term “big data” is to think that they simply don’t have that problem—but this is to ignore the variety of data. The notion of variety in data encompasses the idea of using multiple sources of data to help understand a problem. It’s forgivable to overlook this potential: as an abstract concept, it’s harder to grasp than “bigger” or “faster”.

Notwithstanding the difficulties inherent in grasping the concept, the ability of an additional data set to shed light on observed phenomena is profound. Consider, for instance, the addition of weather, geographical and social media data to the daily sales figures for a retail chain. It is easy to conceive that correlations with peaks and troughs in sales could be elicited: perhaps with good weather, word-of-mouth trends or road accessibility. With sufficient data, some of these events might even be found to be predictive of sales.

While identifying such trends may seem well-worn examples in today’s marketing-driven environment, the reality of taking advantage of such variety is less straightforward. In general, data systems are geared up to expect clean, tabular data of the sort that flows into relational database systems and data warehouses.

Handling diverse and messy data requires a lot of cleanup and preparation. Four years into the era of data scientists, most practitioners report that their primary occupation is still obtaining and cleaning data sets. This forms 80% of the work required before the much-publicized investigational skill of the data scientist can be put to use.

Understanding and identifying

Even to focus on the problems of cleaning data is to ignore the primary problem, however. A chief obstacle for many business and research endeavors is simply locating, identifying and understanding data sources in the first place, either internal or external to an organization.

This is complicated not only in a technical sense, but often in a political, legal or logistical way.

While the data scientist may not be able to directly bring about organizational or political change, they are able to materially affect the accessibility and comprehensibility of data. It’s not glamorous, but it’s powerful: they must document and describe their data. The documentation and description of datasets with metadata—data about data—enhances the discoverability and usability of data both for current and future applications, as well as forming a platform for the vital function of tracking data provenance.

Though “metadata” has long and somewhat unfairly been held as a slightly dull topic, of interest primarily to librarians, the news coverage of national security agencies tracking metadata has served to educate the public of metadata’s importance in understanding and exploiting information and behavior. Metadata within data infrastructures enables us to locate and combine data, and to analyze its lifecycle and history.

In the last two decades, the problem of data description and discovery has been tackled to an extent within the data warehousing world, but is an expensive approach to deploy even within an organization, regardless of its ability to scale across multiple data platforms or the web.

In the same way we looked to the web for big data technologies, there are seeds of possible routes forward for data description and discovery out on the web. While we hear much trumpeting of open data for government, today’s enterprises are much further behind. It’s hard for employees to share data with each other, never mind third parties. Perhaps open data technology can be brought inside organizations. The work of the Open Knowledge Foundation in promoting public open data has led to technical developments in data sharing, many of which have intra-organizational potential as well as for their intended purpose. Notable among these is CKAN, an open source data portal platform.

Conclusion

The output of one step of data processing necessarily becomes the input of the next. To process data and exploit only the result of the calculation is short-sighted. The practices and tools of big data and data science do not stand alone in the data ecosystem. They rely on the usability of data, and we will all gain from ensuring that our results are able to form a platform for future discovery and innovation. As big data tools grow in maturity and adoption over 2014, we will see the rising importance of the need to support this kind of exchange and collaboration around enterprise data.

Source: Forbes

 

iPad Air Giveaway. Win a free iPad Air.

You Might Also Like

Updates


Sponsored Update

Update: 10

4 Firms In Iskandar Malaysia Get Facilitation Fund Totalling RM16.2 Million

Source: Malaysia Today

(Bernama) – Four companies have received the facilitation fund amounting to RM16.2 million from TERAJU@Iskandar Malaysia in an information sharing programme on business opportunities and assistance for the small and... READ MORE ...
Source: Malaysia Today   Full article at: Malaysia Today 1 week ago, 7:42am CDT
 


Advertisement


Update: 9

Selangor’s Watergate about to explode

Source: Malaysia Today

Anwar brought Wan Azmi to meet Khalid to try to resolve this matter. Anwar supports the RM2.5 billion claim but Khalid is stubbornly sticking to the figure of RM250 million. ...
Source: Malaysia Today   Full article at: Malaysia Today Jul 23 2014, 4:12am CDT
 

More From the Web

Update: 8

GLCs told to provide RM7 billion to develop Bumiputera firms

Source: Malaysia Today

(The Malaysian Insider) – Putrajaya has directed government-linked companies (GLC) to generate RM7 billion in business opportunities for Bumiputera firms this year, Prime Minister Prime Minister Datuk Seri Najib ...
Source: Malaysia Today   Full article at: Malaysia Today Jul 17 2014, 2:05am CDT
 

Update: 7

Rafizi shows proof of Putrajaya’s hand in contentious carpet-trader loan

Source: Malaysia Today

Eileen Ng, The Malaysian Insider PKR lawmaker Rafizi Ramli today revealed documents to prove Putrajaya interfered in the RM32 million Bank Rakyat loan to ‎controversial businessman Deepak Jaikishan. Rafi ...
Source: Malaysia Today   Full article at: Malaysia Today Jul 16 2014, 2:16am CDT
 

Update: 6

PAS MP claims Bank Rakyat loaned carpet-trader RM215m, interest free

Source: Malaysia Today

(Malay Mail Online) – Controversial businessman Deepak Jaikishan (pic) was given a whopping RM215 million interest-free loan from state-owned Bank Rakyat, a PAS lawmaker alleged today, despite P ...
Source: Malaysia Today   Full article at: Malaysia Today Jul 14 2014, 5:52am CDT
 

Update: 5

MRCB, Nusa Gapurna and PKNS settle suit over RM3 billion PJ Sentral project

Source: Malaysia Today

(The Malaysian Insider) – Malaysian Resources Corp Bhd (MRCB), Nusa Gapurna Development Sdn Bhd (NGD) and PKNS Holdings Sdn Bhd have reached an out-of-court settlemen ...
Source: Malaysia Today   Full article at: Malaysia Today Jun 21 2014, 2:53am CDT
 

Update: 4

Story behind Syed Mokhtar’s ‘RM2.25 billion tax-exempt’ Bernas deal revealed, says PKR MP

Source: Malaysia Today

(The Malaysian Insider) – An innocuous written reply in Parliament has provided a peek into the cosy relationship between the Najib administration and Malaysia’s best-known businessman, Tan Sri Syed Mokhtar.. ...
Source: Malaysia Today   Full article at: Malaysia Today Jun 15 2014, 12:56pm CDT
 

Update: 3

Constitutional monarchy still murky concept

Source: Malaysia Today

After the RM4.5 billion land sale, the Sultan of Johor secured a 15% stake in MOL Access Portal (MOL) for RM396 million and took a 20% st ...
Source: Malaysia Today   Full article at: Malaysia Today Jun 14 2014, 12:37am CDT
 

Update: 2

Sultan of Johor’s RM4.5 bil backlash?

Source: Malaysia Today

Has Sultan Ibrahim of Johor’s succession of big money deals over the last six months caused the tide of public opinion to turn against Johor’s royal palace? KiniB ...
Source: Malaysia Today   Full article at: Malaysia Today Jun 11 2014, 8:49am CDT
 

Update: 1

Putrajaya denies carpet dealer given interest-free loan

Source: Malaysia Today

Eileen Ng, The Malaysian Insider Putrajaya today refuted allegations that Bank Rakyat had allowed carpet dealer Deepak Jaikishan to repay a RM32 million loan without interest following ...
Source: Malaysia Today   Full article at: Malaysia Today Jun 11 2014, 5:44am CDT
 

Shopping Deals

 
 
 

<a href="/latest_stories/all/all/31" rel="author">Forbes</a>
Forbes is among the most trusted resources for the world's business and investment leaders, providing them the uncompromising commentary, concise analysis, relevant tools and real-time reporting they need to succeed at work, profit from investing and have fun with the rewards of winning.

 

 

Comments

blog comments powered by Disqus

Latest stories

Trending Now: Allison Williams, Sharknado 2 and Kate Upton
Trending Now: Allison Williams, Sharknado 2 and Kate Upton
Find below the top 3 trending news stories to start your day.
 
 
T-Mobile adds 1.5 million Subscriber in Q2
T-Mobile adds 1.5 million Subscriber in Q2
T-Mobile USA is on a wild run. The carrier added 1.5 million new customers in the last quarter.
 
 
Gilt Smartwatch unveiled Powered by HP and Designed by Michael Bastian
Gilt Smartwatch unveiled Powered by HP and Designed by Michael Bastian
Members only shopping announced the their own Smartwatch coming this fall.
 
 
Brian Williams announces Daughter&#039;s Role in Peter Pan Live
Brian Williams announces Daughter's Role in Peter Pan Live
This is very cute. News anchor Brian Williams announces that his daughter Allison Williams is playing Peter Pan in the NBC Christmas production Peter Pan Live. Watch the news segment below.
 
 
 

About the Geek Mind

The “geek mind” is concerned with more than just the latest iPhone rumors, or which company will win the gaming console wars. I4U is concerned with more than just the latest photo shoot or other celebrity gossip.

The “geek mind” is concerned with life, in all its different forms and facets. The geek mind wants to know about societal and financial issues, both abroad and at home. If a Fortune 500 decides to raise their minimum wage, or any high priority news, the geek mind wants to know. The geek mind wants to know the top teams in the National Football League, or who’s likely to win the NBA Finals this coming year. The geek mind wants to know who the hottest new models are, or whether the newest blockbuster movie is worth seeing. The geek mind wants to know. The geek mind wants—needs—knowledge.

Read more about The Geek Mind.