Neural Language Modeling for Molecule Generation
Published in ChemRxiv, 2021
Recommended citation: Adilov, Sanjar (2021): Neural Language Modeling for Molecule Generation. ChemRxiv. Preprint. doi:10.26434/chemrxiv.14700831.v1 https://doi.org/10.26434/chemrxiv.14700831.v1
A comprehensive outline of SMILES-based autoregressive language models. [ChemRxiv] [Github]
Abstract
Generative neural networks have shown promising results in de novo drug design. Recent studies suggest that one of the efficient ways to produce novel molecules matching target properties is to model SMILES sequences using deep learning in a way similar to language modeling in natural language processing. In this paper, we present a survey of various machine learning methods for SMILES-based language modeling and propose our benchmarking results on a standardized subset of ChEMBL database.