Benchmarking of multivariate similarity measures for high-content screening fingerprints in phenotypic drug discovery

J Biomol Screen. 2013 Dec;18(10):1284-97. doi: 10.1177/1087057113501390. Epub 2013 Sep 17.

Abstract

High-content screening (HCS) is a powerful tool for drug discovery being capable of measuring cellular responses to chemical disturbance in a high-throughput manner. HCS provides an image-based readout of cellular phenotypes, including features such as shape, intensity, or texture in a highly multiplexed and quantitative manner. The corresponding feature vectors can be used to characterize phenotypes and are thus defined as HCS fingerprints. Systematic analyses of HCS fingerprints allow for objective computational comparisons of cellular responses. Such comparisons therefore facilitate the detection of different compounds with different phenotypic outcomes from high-throughput HCS campaigns. Feature selection methods and similarity measures, as a basis for phenotype identification and clustering, are critical for the quality of such computational analyses. We systematically evaluated 16 different similarity measures in combination with linear and nonlinear feature selection methods for their potential to capture biologically relevant image features. Nonlinear correlation-based similarity measures such as Kendall's τ and Spearman's ρ perform well in most evaluation scenarios, outperforming other frequently used metrics (such as the Euclidian distance). We also present four novel modifications of the connectivity map similarity that surpass the original version, in our experiments. This study provides a basis for generic phenotypic analysis in future HCS campaigns.

Keywords: HCS fingerprints; data analysis; high-content screening; phenotypic screening; similarity metrics.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Area Under Curve
  • Cell Line, Tumor
  • Computer Simulation
  • Data Interpretation, Statistical
  • Drug Evaluation, Preclinical / standards*
  • High-Throughput Screening Assays / standards*
  • Humans
  • Multivariate Analysis
  • Phenotype
  • Principal Component Analysis
  • ROC Curve
  • Reference Standards