A Scalable, Web-Based Platform for Proteomics Data Processing, Result Storage and Analysis

J Proteome Res. 2025 Mar 7;24(3):1241-1249. doi: 10.1021/acs.jproteome.4c00871. Epub 2025 Feb 21.

Abstract

The exponential increase in proteomics data presents critical challenges for conventional processing workflows. These pipelines often consist of fragmented software packages, glued together using complex in-house scripts or error-prone manual workflows running on local hardware, which are costly to maintain and scale. The MSAID Platform offers a fully automated, managed proteomics data pipeline, consolidating formerly disjointed functions into unified, API-driven services that cover the entire process from raw data to biological insights. Backed by the cloud-native search algorithm CHIMERYS, as well as scalable cloud compute instances and data lakes, the platform facilitates efficient processing of large data sets, automation of processing via the command line, systematic result storage, analysis, and visualization. The data lake supports elastically growing storage and unified query capabilities, facilitating large-scale analyses and efficient reuse of previously processed data, such as aggregating longitudinally acquired studies. Users interact with the platform via a web interface, CLI client, or API, providing flexible, automated access. Readily available tools for accessing result data include browser-based interrogation and one-click visualizations for statistical analysis. The platform streamlines research processes, making advanced and automated proteomic workflows accessible to a broader range of scientists. The MSAID Platform is globally available via https://platform.msaid.io.

Keywords: AWS; CHIMERYS; SaaS; cloud; compute infrastructure; data processing; pipeline; platform; proteomics; scalable.

MeSH terms

  • Algorithms
  • Cloud Computing
  • Databases, Protein
  • Internet*
  • Proteomics* / methods
  • Software*
  • Workflow