Trending

Filed under: News

 

Big Data Systems Are Making A Difference In The Fight Against Cancer

Jan 17 2014, 5:21pm CST | by

Big Data Systems Are Making A Difference In The Fight Against Cancer
Photo Credit: Forbes
 
 
 

By Ben Lorica

As open source, big data tools enter the early stages of maturation, data engineers and data scientists will have many opportunities to use them to “work on stuff that matters.” Along those lines, computational biology and medicine are areas where skilled data professionals are already beginning to make an impact. I recently came across a compelling open source project from UC Berkeley’s AMPLab: ADAM is a processing engine and set of formats for genomics data.

Second-generation sequencing machines produce more detailed and thus much larger files for analysis (250+ GB file for each person). Existing data formats and tools are optimized for single-server processing and do not easily scale out. ADAM uses distributed computing tools and techniques to speedup key stages of the variant processing pipeline (including sorting and deduping):

Very early on the designers of ADAM realized that a well-designed data schema (that specifies the representation of data when it is accessed) was key to having a system that could leverage existing big data tools. The ADAM format uses the Apache Avro data serialization system and comes with a human-readable schema that can be accessed using many programming languages (including C/C++/C#, Java/Scala, php, Python, Ruby). ADAM also includes a data format/access API implemented on top of Apache Avro and Parquet, and a data transformation API implemented on top of Apache Spark. Because it’s built with widely adopted tools, ADAM users can leverage components of the Hadoop (Impala, Hive, MapReduce) and BDAS (Shark, Spark, GraphXMLbase) stacks for interactive and advanced analytics.

Although active development only started in Sept/2013, early results indicate that distributed computing tools and techniques lead to substantial speedups. Below are two recent tests from different computing clusters: Amazon EC2 and a cluster at the Icahn School of Medicine at Mt. Sinai(1). The combination of sorting and deduping took 38 hours using existing tools, but runs on less than an hour on a 32-node ADAM cluster.

Computational results like the ones above are drawing the attention of the science community: the AMPLab recently joined an Oregon Health & Science University (OHSU) research initiative to BeatAML (acute myeloid leukemia).

How to help

ADAM is a new project with a small codebase (11,000 lines of code). If you’re a big data hacker looking for a high-impact project to work on, consider contributing to the development of ADAM. Components are developed under an Apache License, so your contributions benefit the open source community. For details on how to contribute contact Matt Massie, lead developer of ADAM.


(0) This post is based on an extended conversation with Matt Massie. For more on ADAM, see this recent technical report.
(1) The cancer research program at the Icahn School of Medicine at Mt. Sinai, was the subject of a moving feature on Esquire magazine.

This post originally appeared on O’Reilly Strata (“Big Data systems are making a difference in the fight against cancer“). It’s republished with permission.

Source: Forbes

You Might Also Like

Updates

Shopping Deals

 
 
 

<a href="/latest_stories/all/all/31" rel="author">Forbes</a>
Forbes is among the most trusted resources for the world's business and investment leaders, providing them the uncompromising commentary, concise analysis, relevant tools and real-time reporting they need to succeed at work, profit from investing and have fun with the rewards of winning.

 

 

Comments

blog comments powered by Disqus

Latest stories

Nike is Not Shutting Down FuelBand, But Facing Downslide in Wearable Devices
Nike is Not Shutting Down FuelBand, But Facing Downslide in Wearable Devices
Nike has not stopped its FuelBand production. However, it seems to be facing a downslide in the department of wearable devices.
 
 
Peaches Geldof to be Laid to Rest on Easter Monday
Peaches Geldof to be Laid to Rest on Easter Monday
Peaches Geldof is to be laid to rest soon after her sudden death due to unknown causes. Some say that after being incinerated, her remains will be spread in the soil surrounding her family residence. On the other hand Peaches Geldof may be buried.
 
 
Padma Lakshmi Shows Bad Taste in Fashion
Padma Lakshmi Shows Bad Taste in Fashion
The otherwise slim and smart Padma Lakshmi showed bad taste in fashion with her latest ensemble. The midriff baring look simply did not make her appear as presentable as she ought to have been.
 
 
Idris Elba Welcomes Son Winston Elba
Idris Elba Becomes Dad Again
British actor Idris Elba has become a dad again. Elba’s girlfriend Naiyana Garth gave birth to a baby boy, named Winston Elba, on Thursday.
 
 
 

The Hottest Photos of Victoria's Secret Fashion Show 2013

 

Viral Stories the Web