Commit fd71a928 authored by Khoa A Nguyen's avatar Khoa A Nguyen
Browse files

Update README.md

parent daad87c8
# DS senior capstone - Khoa Nguyen
Predicting the “penalized water-octanol partition coefficient - logP” of molecules from the ZINC database using Equivariant GNNs
Predicting the “solubility - logP” of drug compounds from the ZINC database using Graph Neural Networks
# Abstract
Drug discovery is a time-consuming process that can take up to decades from the process of finding a suitable group of molecule, testing, and approving.
Drug discovery and development is a costly and time-consuming process, taking up to billions of dollars and 12-15 years from basic research to FDA approval. Early stage discovery involves intensive search through an enormous database of molecules and analysis of their quantitative structure-activity relationships (QSAR). Important features like absorption, dis- tribution, metabolism, and excretion (ADME) are extracted to measure how these compounds interact with the human bodies. At its root, this is an optimization problem in which researchers try to identify the “best” compounds with desired properties to be qualified for clinical development to produce a safe and cost- effective drug. Nowadays, with stronger computation power, the process can be sped up significantly with artificial intelligence. Many deep learning models have demonstrated highly accurate predictions on the ADME properties of drug-like small molecules. In particular, graph neural networks (GNN) are shown to learn effectively graph-based representation of molecules. My paper discusses the application of deep learning in chemoinformatics and examines the feasibility of E(n) – Equivariant Graph Neural Networks (EGNN), one of the state-of-the-art graph neural networks, on predicting the solubility of commercially available compounds in the ZINC database for early state virtual screening of drug development.
Important features that people are looking at in the discovery stage are the ADMET (absorption, distribution, metabolism, excretion, and toxicity) properties. This is an optimization problem where we try to correctly predict the properties of the molecules and select them to product a safe and cost-effective drug. Nowadays, with stronger computational power, the process can be sped up with machine learning.
In this capstone, I will carry out the task to predict a molecule property called partition coefficient (constrained solubility) using the state-of-the-art graph neural networks.
# Data Architecture
[Data_Architecture.pdf](/uploads/2ab2bc166c7c4817ce7a17c994f39d59/Data_Architecture.pdf)
[Data_Diagram](/uploads/8de4117eb415c14d82225e12b0c18da8/Data_Diagram.png)
# Software Demonstration Video
Link: (https://www.youtube.com/watch?v=yyqG0dUZ3Po)
Link: (https://www.youtube.com/watch?v=Y9XXpPhnDnI)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment