Publication Date

Fall 2015

Document Type

Technical Report

Abstract

Proteogenomics is an emerging field of systems biology research at the intersection of proteomics and genomics. Two high-throughput technologies, Mass Spectrometry (MS) for proteomics and Next Generation Sequencing (NGS) machines for genomics are required to conduct proteogenomics studies. Independently both MS and NGS technologies are inflicted with data deluge which creates problems of storage, transfer, analysis and visualization. Integrating these big data sets (NGS+MS) for proteogenomics studies compounds all of the associated computational problems. Existing sequential algorithms for these proteogenomics datasets analysis are inadequate for big data and high performance computing (HPC) solutions are almost non-existent. The purpose of this paper is to introduce the big data problem of proteogenomics and the associated challenges in analyzing, storing and transferring these data sets. Further, opportunities for high performance computing research community are identified and possible future directions are discussed.

Published Citation

Proceedings of Signal and Information Processing for Software-Defined Ecosystems, and Green Computing, IEEE GlobalSIP 2015

Download

Included in

Bioinformatics Commons, Computational Engineering Commons, Data Storage Systems Commons, Digital Communications and Networking Commons

COinS

Parallel Computing and Data Science Lab Technical Reports

Big Data Proteogenomics and High Performance Computing: Challenges and Opportunities

Publication Date

Document Type

Abstract

Published Citation

Included in

Search

Browse

Author Corner

Links

Parallel Computing and Data Science Lab Technical Reports

Big Data Proteogenomics and High Performance Computing: Challenges and Opportunities

Authors

Publication Date

Document Type

Abstract

Published Citation

Included in

Share

Search

Browse

Author Corner

Links