Latest News: Technology |  Celebrity |  Movies |  Apple |  Cars |  Business |  Sports |  TV Shows |  Geek


Filed under: News


Big Data Systems Are Making A Difference In The Fight Against Cancer

Jan 17 2014, 5:21pm CST | by

Big Data Systems Are Making A Difference In The Fight Against Cancer

Photo Credit: Forbes

By Ben Lorica

As open source, big data tools enter the early stages of maturation, data engineers and data scientists will have many opportunities to use them to “work on stuff that matters.” Along those lines, computational biology and medicine are areas where skilled data professionals are already beginning to make an impact. I recently came across a compelling open source project from UC Berkeley’s AMPLab: ADAM is a processing engine and set of formats for genomics data.

Second-generation sequencing machines produce more detailed and thus much larger files for analysis (250+ GB file for each person). Existing data formats and tools are optimized for single-server processing and do not easily scale out. ADAM uses distributed computing tools and techniques to speedup key stages of the variant processing pipeline (including sorting and deduping):

Very early on the designers of ADAM realized that a well-designed data schema (that specifies the representation of data when it is accessed) was key to having a system that could leverage existing big data tools. The ADAM format uses the Apache Avro data serialization system and comes with a human-readable schema that can be accessed using many programming languages (including C/C++/C#, Java/Scala, php, Python, Ruby). ADAM also includes a data format/access API implemented on top of Apache Avro and Parquet, and a data transformation API implemented on top of Apache Spark. Because it’s built with widely adopted tools, ADAM users can leverage components of the Hadoop (Impala, Hive, MapReduce) and BDAS (Shark, Spark, GraphXMLbase) stacks for interactive and advanced analytics.

Although active development only started in Sept/2013, early results indicate that distributed computing tools and techniques lead to substantial speedups. Below are two recent tests from different computing clusters: Amazon EC2 and a cluster at the Icahn School of Medicine at Mt. Sinai(1). The combination of sorting and deduping took 38 hours using existing tools, but runs on less than an hour on a 32-node ADAM cluster.

Computational results like the ones above are drawing the attention of the science community: the AMPLab recently joined an Oregon Health & Science University (OHSU) research initiative to BeatAML (acute myeloid leukemia).

How to help

ADAM is a new project with a small codebase (11,000 lines of code). If you’re a big data hacker looking for a high-impact project to work on, consider contributing to the development of ADAM. Components are developed under an Apache License, so your contributions benefit the open source community. For details on how to contribute contact Matt Massie, lead developer of ADAM.

(0) This post is based on an extended conversation with Matt Massie. For more on ADAM, see this recent technical report.
(1) The cancer research program at the Icahn School of Medicine at Mt. Sinai, was the subject of a moving feature on Esquire magazine./>

This post originally appeared on O’Reilly Strata (“Big Data systems are making a difference in the fight against cancer“). It’s republished with permission.

Source: Forbes


You Might Also Like


Shopping Deals


<a href="/latest_stories/all/all/31" rel="author">Forbes</a>
Forbes is among the most trusted resources for the world's business and investment leaders, providing them the uncompromising commentary, concise analysis, relevant tools and real-time reporting they need to succeed at work, profit from investing and have fun with the rewards of winning.




blog comments powered by Disqus

Latest stories

Jennifer Aniston tasted Jimmy Kimmel&#039;s Wife’s Breast Milk
Jennifer Aniston tasted Jimmy Kimmel's Wife’s Breast Milk
The Friends starlet, Jennifer Aniston shared some very personal matters with the press recently. They included tasting her male friend Jimmy Kimmel's wife’s breast milk! Jimmy Kimmel's
Gwyneth Paltrow facing Lawsuit
Gwyneth Paltrow facing Lawsuit
The famous actress and creator of a website named Goop, Gwyneth Paltrow, is facing a lawsuit by a man who claims she plagiarized his creative phrase. And she is also concerned about her ex-husband’s new relationships.
Ashlee Simpson celebrates Evan Ross Birthday Bash
Ashlee Simpson celebrates Evan Ross Birthday Bash
The pop starlet, Ashlee Simpson celebrated her fiance’s birthday bash and also wore a special hat later on in the style of Mary Poppins.
iPhone 6 from Apple: Where to Next?
iPhone 6 from Apple: Where to Next?
With the arrival of the iPhone 6, the question for the execs at Apple Incorporated is: where to next?

About the Geek Mind

The “geek mind” is concerned with more than just the latest iPhone rumors, or which company will win the gaming console wars. I4U is concerned with more than just the latest photo shoot or other celebrity gossip.

The “geek mind” is concerned with life, in all its different forms and facets. The geek mind wants to know about societal and financial issues, both abroad and at home. If a Fortune 500 decides to raise their minimum wage, or any high priority news, the geek mind wants to know. The geek mind wants to know the top teams in the National Football League, or who’s likely to win the NBA Finals this coming year. The geek mind wants to know who the hottest new models are, or whether the newest blockbuster movie is worth seeing. The geek mind wants to know. The geek mind wants—needs—knowledge.

Read more about The Geek Mind.