What is proteogenomics?

Proteogenomics workflow

Proteogenomics integrates genomic information with mass spectrometry (MS)-based proteomics data.  The most common workflow uses transcriptome sequencing information, obtained using next-generation sequencing methods like RNA-seq.  The assembled transcriptome is translated in-silico to generate a database of possible proteins expressed in the sample.  This database includes those proteins with possible novel sequences, derived from DNA or RNA sequence variants.  Matching proteomics data (in the form of tandem mass spectra, also know as MS/MS spectra) with sequences in the database provides a confirmation of the expression of novel protein sequences in the sample.  The proteogenomics approach provides new insights into novel protein sequences that may carry new functions and be drivers of disease.  This approach also provides a powerful means to annotate genomes.

Galaxy-P provides an ideal platform for proteogenomics, which requires integration of software for analysis of genomic or transcriptomic data (e.g. RNA-seq data) and also MS-based proteomics data.  Galaxy-P has created an educational instance with training materials for proteogenomics research.  You can access z.umn.edu/proteogenomicsgateway to access tools and workflows related to protegenomics research.

The Galaxy P-team has published several seminal papers on the use of Galaxy for proteogenomics.