Data-driven material designs often encounter challenges from small and imbalanced datasets. The complex structural and physicochemical properties of hybrid halide perovskites, coupled with these limitations, create obstacles for performing feature engineering and extracting key fingerprints. Herein, we employed a physical-informed data-driven modeling approach to identify lattice geometric fingerprints, such as the distortion index (DI) and effective coordination number (ECoN), and to establish a robust structure-property relationship mapping the electronic bandgap, resulting in improved model performance. Lattice compression simulations across multiple phases of MAPbI3 further confirmed a strong correlation between DI and ECoN with the electronic bandgap, validating the robustness of the selected octahedra geometrical fingerprints. By adjusting the s-p antibonding coupling, the pressure-driven reduction in local octahedral distortion, induced by the anisotropic hydrogen bonding between the inorganic framework and organic cation, narrows the electronic bandgap and facilitates the p-p transitions, thereby boosting the transition dipole moment and band-edge absorption. Combining data mining with physical analysis, we have successfully clarified the significant impact of lattice geometry on the electronic properties and identified key octahedral geometric fingerprints for effectively describing the electronic bandgap, while also revealing the microphysical mechanisms of local octahedral distortion on the optoelectronic properties of hybrid halide perovskites.
© 2025 Author(s). Published under an exclusive license by AIP Publishing.