Protein structure characterization is critical for therapeutic protein drug development and production. Drop-coating deposition Raman (DCDR) spectroscopy offers rapid and cost-effective acquisition of vibrational spectral data characteristic of protein secondary structures. Amide I region (1600 -1700 cm-1) and amide II region (1500-1600 cm-1) of DCRD Raman spectra measured for model proteins of varying molecular size and structural distribution were first analyzed by peak fitting for their proportions of six secondary structure motifs: α-helices, 310-helices, β-sheets, turns (β-turns and γ-turns), bends, and random coil. The high spectral resolution and superior signal-to-noise of DCDR spectra made it possible to estimate all six structural motifs at accuracy comparable to X-ray crystallographic measurement. The ease of DCDR measurement was further explored by introducing machine learning algorithm to spectroscopic data analysis. Partial Least Squares (PLS) regression modeling was used as a machine learning tool to predict the protein secondary structural composition from the amide I band of model proteins. Once developed on a training sample set, the PLS model was tested by applying to a sample set that was not used previously for model development. Low prediction errors were achieved at 1.36 %, 0.78 %, 0.42 % 0.41 %, 0.81 %, and 0.52 %, respectively for the six structural component, α-Helix, β-Sheet, 310-helices, random, turns, and bends. The PLS model was further tested on an independent sample set that contains three IgG proteins. The proportion ofα-Helix, β-Sheet, 310-Helix were estimated with an error of 3.1 %, 2.3 % and 2.8 %, respectively.
Keywords: Drop-coating deposition; Machine learning; Partial least squares; Peak fitting; Protein secondary structure; Raman spectroscopy.
Copyright © 2025 The Authors. Published by Elsevier B.V. All rights reserved.