https://doi.org/10.15255/KUI.2017.052
Published: Kem. Ind. 67 (9-10) (2018) 409–419
Paper reference number: KUI-52/2017
Paper type: Professional paper
Download paper: PDF
Molecular Modelling of the Quantitative Structure Activity Relationship in Python (Part I)
M. Lovrić
Nowadays, the amount of data is increasing considerably, as is their value and knowledge of how to manipulate and extract valuable information. A well-known example of information exploitation is the search of known and design of new chemical compounds based on modelling for the purpose of researching new potential drugs. Therefore, a chemistry student must be well prepared for the current digital era, where it is no longer enough to be skilled in the laboratory, but also unavoidable to be proficient in modelling and analysing data. This handbook covers the basics of molecular modelling and QSAR and the basics of data handling using Python, a free programming language and its molecular modelling library RDKit. Other Python libraries which will be used throughout the manual are: Pandas, for handling and processing all kinds of data; statsmodels, Numpy, Scipy, and SKLearn for mathematical and statistical operations, and linear algebra and Matplotlib and Seaborn for visualisation. The Python programming language is integrated with its mentioned libraries into the Anaconda software. Anaconda enables the user to easily use and manage libraries, as well as use the Jupyter Notebook interface for programming, plotting and data analysis. In this first part of the manual series, the problem of water solubility prediction of a set of organic compounds will be analysed using univariate linear regression. The aim of this series of manuals is to familiarise chemists with the Python programming language, its libraries and practical approaches for solving molecular modelling problems.
This work is licensed under a Creative Commons Attribution 4.0 International License
QSAR, Python, Jupyter Notebook, molecular modeling, RDKit