Molecular-Property Prediction with Sparsity

Published in ChemRxiv, 2022

Recommended citation: Adilov, Sanjar (2022): Molecular-Property Prediction with Sparsity. ChemRxiv. Preprint. doi:10.26434/chemrxiv-2022-g7mfn http://doi.org/10.26434/chemrxiv-2022-g7mfn

Penalized linear models enforcing sparsity on grouped molecular representations. [ChemRxiv] [Github]

Abstract

Machine learning models for molecular-property prediction typically work with molecular representations in the form of fingerprints, descriptors, or graphs. In case of fingerprints and descriptors, molecular representations usually comprise thousands of features, which causes the curse of dimensionality for many tabular models. In this work, we introduce penalized linear models enforcing sparsity on grouped molecular representations. Loosely speaking, sparsity penalties aim to select a relatively small number of features to improve the interpretability and computational convenience of machine learning models.