Tutorial on Firth's Logistic Regression Models for Biomarkers in Preclinical Space

Pharm Stat. 2025 Jan-Feb;24(1):e2422. doi: 10.1002/pst.2422. Epub 2024 Aug 6.

Abstract

Preclinical studies are broad and can encompass cellular research, animal trials, and small human trials. Preclinical studies tend to be exploratory and have smaller datasets that often consist of biomarker data. Logistic regression is typically the model of choice for modeling a binary outcome with explanatory variables such as genetic, imaging, and clinical data. Small preclinical studies can have challenging data that may include a complete separation or quasi-complete separation issue that will result in logistic regression inflated coefficient estimates and standard errors. Penalized regression approaches such as Firth's logistic regression are a solution to reduce the bias in the estimates. In this tutorial, a number of examples with separation (complete or quasi-complete) are illustrated and the results from both logistic regression and Firth's logistic regression are compared to demonstrate the inflated estimates from the standard logistic regression model and bias-reduction of the estimates from the penalized Firth's approach. R code and datasets are provided in the supplement.

Keywords: Firth's logistic regression; binary outcome; biomarkers; complete separation; logistic regression; quasi‐complete separation.

MeSH terms

  • Animals
  • Biomarkers* / analysis
  • Drug Evaluation, Preclinical / methods
  • Drug Evaluation, Preclinical / statistics & numerical data
  • Humans
  • Logistic Models

Substances

  • Biomarkers