Data Preprocessing to Avoid False Negatives in GNN Predicted PPI Results
DOI: https://doi.org/10.62381/ACS.DIMI2025.03
Author(s)
Yinuo Feng
Affiliation(s)
Department of Science, Xi’an Jiaotong-Liverpool University, Suzhou, China
Abstract
Since the beginning of PPI research in the 1980s, PPI has been widely used in clinical fields, such as cancer treatment and nanobody research. In many PPI research scenarios, scientists typically use similar statistical methods and frameworks to predict protein-protein interactions. In earlier PPI studies, data statistics were mostly based on time-consuming and laborious wet experimental methods such as yeast two-hybrid and quantitative proteomics methods. In subsequent PPI research, various neural networks based on GNN and CNN became the main PPI prediction methods. In recent years, ProBert and ProteinLM technologies based on large language models have also been widely applied. Using LLM as a link predictor shows that it can better capture the relationship information between nodes in knowledge graph tasks, and can extend biomedical knowledge graphs and interpreted predicted edges by combining GNN with LLM. In many previous PPI predictions, the limitations of GNN technology, which requires high-quality graph structure data and limited modeling ability for node features (such as protein sequences), often led to false negatives in the prediction results. In this study, the process of predicting PPI using GNN will be subdivided, and false negatives will be avoided through multiple data preprocessing.
Keywords
Protein-Protein Interaction; Graph Neural Network; Graph Convolutional Network; False Negative; Data Preprocessing
References
[1] Ali A, Bagchi A. An overview of protein-protein interaction[J]. Current Chemical Biology, 2015, 9(1): 53-65.
[2] Franciosi J P, Mougey E B, Dellon E S, et al. Proton pump inhibitor therapy for eosinophilic esophagitis: history, mechanisms, efficacy, and future directions[J]. Journal of asthma and allergy, 2022: 281-302.
[3] Elhabashy H, Merino F, Alva V, et al. Exploring protein-protein interactions at the proteome level[J]. Structure, 2022, 30(4): 462-475.
[4] Hu L, Wang X, Huang Y A, et al. A survey on computational models for predicting protein–protein interactions[J]. Briefings in bioinformatics, 2021, 22(5): bbab036.
[5] Soleymani F, Paquet E, Viktor H, et al. Protein–protein interaction prediction with deep learning: A comprehensive review[J]. Computational and Structural Biotechnology Journal, 2022, 20: 5316-5341.
[6] Zheng X, Wang Y, Liu Y, et al. Graph neural networks for graphs with heterophily: A survey[J]. arXiv preprint arXiv:2202.07082, 2022.
[7] Li Z, Liu F, Yang W, et al. A survey of convolutional neural networks: analysis, applications, and prospects[J]. IEEE transactions on neural networks and learning systems, 2021, 33(12): 6999-7019.
[8] Li S, Zhou J, Xu T, et al. Structure-aware interactive graph neural networks for the prediction of protein-ligand binding affinity[C]//Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining. 2021: 975-985.
[9] Jin M, Xue H, Wang Z, et al. ProLLM: protein chain-of-thoughts enhanced LLM for protein-protein interaction prediction[J]. bioRxiv, 2024: 2024.04. 18.590025.