A Scalable, Web-Based Platform for Proteomics Data Processing, Result Storage and Analysis

Markus Schneider; Daniel P Zolg; Patroklos Samaras; Samia Ben Fredj; Dulguun Bold; Agnes Guevende; Alexander Hogrebe; Michelle T Berger; Michael Graber; Vishal Sukumar; Lizi Mamisashvili; Igor Bronsthein; Layla Eljagh; Siegfried Gessulat; Florian Seefried; Tobias Schmidt; Martin Frejno

doi:10.1021/acs.jproteome.4c00871

A Scalable, Web-Based Platform for Proteomics Data Processing, Result Storage and Analysis

J Proteome Res. 2025 Mar 7;24(3):1241-1249. doi: 10.1021/acs.jproteome.4c00871. Epub 2025 Feb 21.

Authors

Markus Schneider¹, Daniel P Zolg¹, Patroklos Samaras¹, Samia Ben Fredj¹, Dulguun Bold¹, Agnes Guevende¹, Alexander Hogrebe², Michelle T Berger¹, Michael Graber¹, Vishal Sukumar¹, Lizi Mamisashvili¹, Igor Bronsthein², Layla Eljagh¹, Siegfried Gessulat², Florian Seefried¹, Tobias Schmidt¹, Martin Frejno¹

Affiliations

¹ MSAID GmbH, Garching b. München 85748, Germany.
² MSAID GmbH, Berlin 13347, Germany.

Abstract

The exponential increase in proteomics data presents critical challenges for conventional processing workflows. These pipelines often consist of fragmented software packages, glued together using complex in-house scripts or error-prone manual workflows running on local hardware, which are costly to maintain and scale. The MSAID Platform offers a fully automated, managed proteomics data pipeline, consolidating formerly disjointed functions into unified, API-driven services that cover the entire process from raw data to biological insights. Backed by the cloud-native search algorithm CHIMERYS, as well as scalable cloud compute instances and data lakes, the platform facilitates efficient processing of large data sets, automation of processing via the command line, systematic result storage, analysis, and visualization. The data lake supports elastically growing storage and unified query capabilities, facilitating large-scale analyses and efficient reuse of previously processed data, such as aggregating longitudinally acquired studies. Users interact with the platform via a web interface, CLI client, or API, providing flexible, automated access. Readily available tools for accessing result data include browser-based interrogation and one-click visualizations for statistical analysis. The platform streamlines research processes, making advanced and automated proteomic workflows accessible to a broader range of scientists. The MSAID Platform is globally available via https://platform.msaid.io.

Keywords: AWS; CHIMERYS; SaaS; cloud; compute infrastructure; data processing; pipeline; platform; proteomics; scalable.

MeSH terms

Algorithms
Cloud Computing
Databases, Protein
Internet*
Proteomics* / methods
Software*
Workflow