Commit 1d29bfbc authored by Khoa A Nguyen's avatar Khoa A Nguyen
Browse files

Update README.md

parent f6cba001
# DS senior capstone - Khoa Nguyen
Predicting the “solubility - logP” of drug compounds from the ZINC database using Graph Neural Networks
Deep Learning in Drug Development
# Abstract
Drug discovery and development is a costly and time-consuming process, taking up to billions of dollars and 12-15 years from basic research to FDA approval. Early stage discovery involves intensive search through an enormous database of molecules and analysis of their quantitative structure-activity relationships to determine their physicochemical properties. Important features like absorption, distribution, metabolism, and excretion (ADME) are extracted to measure how these compounds interact with the human bodies. At its root, this is an optimization problem in which researchers try to identify the “best” compounds with desired properties to be qualified for clinical development to produce a safe and cost-effective drug. Nowadays, with stronger computation power, the process can be sped up significantly with artificial intelligence. Many deep learning models have demonstrated highly accurate predictions on the ADME properties of drug-like small molecules. In particular, graph neural networks (GNN) are shown to learn effectively graph-based molecular representation. This paper examines the feasibility of several state-of-the-art graph neural networks on predicting the solubility of commercially available compounds in the ZINC database. The experiment indicated that each model's performance was significantly improved through training. The results suggested promising applications of deep learning in reducing the time and cost of the drug development process in the foreseeable future.
# Reproducing the project
Please go to the `Reproducible_GoogleCollab_notebook.ipynb` file and follow further instructions in the notebook.
Drug discovery and development is a costly and time-consuming process, taking up to billions of dollars and 12-15 years from basic research to FDA approval. Early stage discovery involves intensive search through an enormous database of molecules and analysis of their quantitative structure-activity relationships (QSAR). Important features like absorption, dis- tribution, metabolism, and excretion (ADME) are extracted to measure how these compounds interact with the human bodies. At its root, this is an optimization problem in which researchers try to identify the “best” compounds with desired properties to be qualified for clinical development to produce a safe and cost- effective drug. Nowadays, with stronger computation power, the process can be sped up significantly with artificial intelligence. Many deep learning models have demonstrated highly accurate predictions on the ADME properties of drug-like small molecules. In particular, graph neural networks (GNN) are shown to learn effectively graph-based representation of molecules. My paper discusses the application of deep learning in chemoinformatics and examines the feasibility of E(n) – Equivariant Graph Neural Networks (EGNN), one of the state-of-the-art graph neural networks, on predicting the solubility of commercially available compounds in the ZINC database for early state virtual screening of drug development.
# Paper
Link: (https://drive.google.com/file/d/1JtlH_As6Do4D5LM-J29pgpAKtFlyscxf/view?usp=sharing)
# Data Diagram
Link: (https://drive.google.com/file/d/1o0bo8oheHuwLXJCyPfIq7qBGgNy86xOF/view?usp=sharing)
# Data Architecture
[Data_Diagram](/uploads/8de4117eb415c14d82225e12b0c18da8/Data_Diagram.png)
# Poster
Link: (https://drive.google.com/file/d/1slb6TlEt1tOTRZSmotdMWuPOxtHvUxTh/view?usp=sharing)
# Software Demonstration Video
Link: (https://www.youtube.com/watch?v=Y9XXpPhnDnI)
Link: (https://www.youtube.com/watch?v=tG2B7mEo-zA)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment