Guide
Intro
At present, it mainly realizes link prediction based on protein data under [data/yeast/yeast.edgelist].
Model
Model
The model mainly uses graph neural network, such as gae, vgae, etc
-
1.GCNModelVAE(src/vgae): volume product self coding and variational volume product self coding (self coding or variational self coding can be configured in config). gae/vgae is used as encoder and InnerProductDecoder is used as decoder. Variational Graph Auto-Encoders .
-
2. GCN model Arga (SRC / Arga): anti regularization graph self coding, using gae/vgae as generator; A three-layer feedforward network is used as a discriminator. Adversarially Regularized Graph Autoencoder for Graph Embedding .
-
3.GATModelVAE(src/graph_att_gae): graph product self coding and variational graph product self coding based on graph attention (self coding or variational self coding can be configured in config), using gae/vgae as encoder and InnerProductDecoder as decoder. I added a graph attention layer based on the above [1] method. For graph attention, see [GRAPH ATTENTION NETWORKS] in [Reference].
-
4.GATModelGAN(src/graph_att_gan): anti regularization graph self coding based on graph attention, using gae/vgae as generator; A three-layer feedforward network is used as the discriminator. I added a graph attention layer based on the above [2] method. For graph attention, see [GRAPH ATTENTION NETWORKS] in [Reference].
-
5.NHGATModelVAE(src/graph_nheads_att_gae): graph volume product self coding and variational graph volume product self coding based on graph multi head attention (self coding or variational self coding can be configured in config), gae/vgae is used as encoder and InnerProductDecoder is used as decoder. This method changes the graph attention layer to multi head attention layer on the basis of [3].
-
6.NHGATModelGAN(src/graph_nheads_att_gan): anti regularization graph self coding based on graph multi head attention, using gae/vgae as generator; A three-layer feedforward network is used as the discriminator. Based on the method [4], this method changes the graph attention layer into multi head attention layer.
Usage
-
For the configuration of relevant parameters, see config. In each model folder Cfg file, which will be loaded during training and prediction.
-
Training and prediction
1.GCNModelVAE(src/vgae)
(1). train
from src.vgae.train import Train train = Train() train.train_model('config.cfg')
Epoch: 0001 train_loss = 1.84734 val_roc_score = 0.76573 average_precision_score = 0.68083 time= 0.80005 Epoch: 0002 train_loss = 1.83824 val_roc_score = 0.87289 average_precision_score = 0.86317 time= 0.80361 Epoch: 0003 train_loss = 1.80761 val_roc_score = 0.87641 average_precision_score = 0.86590 time= 0.80121 Epoch: 0004 train_loss = 1.77976 val_roc_score = 0.87737 average_precision_score = 0.86656 time= 0.79843 Epoch: 0005 train_loss = 1.76685 val_roc_score = 0.87759 average_precision_score = 0.86664 time= 0.79843 Epoch: 0006 train_loss = 1.71661 val_roc_score = 0.87767 average_precision_score = 0.86667 time= 0.80479 Epoch: 0007 train_loss = 1.67656 val_roc_score = 0.87775 average_precision_score = 0.86670 time= 0.80509 Epoch: 0008 train_loss = 1.62324 val_roc_score = 0.87785 average_precision_score = 0.86679 time= 0.80446 Epoch: 0009 train_loss = 1.57730 val_roc_score = 0.87781 average_precision_score = 0.86680 time= 0.80424 Epoch: 0010 train_loss = 1.51882 val_roc_score = 0.87789 average_precision_score = 0.86675 time= 0.80852 Epoch: 0011 train_loss = 1.46346 val_roc_score = 0.87792 average_precision_score = 0.86678 time= 0.80625 Epoch: 0012 train_loss = 1.37688 val_roc_score = 0.87795 average_precision_score = 0.86684 time= 0.80474 Epoch: 0013 train_loss = 1.31243 val_roc_score = 0.87795 average_precision_score = 0.86685 time= 0.80574 Epoch: 0014 train_loss = 1.25133 val_roc_score = 0.87791 average_precision_score = 0.86677 time= 0.80267 Epoch: 0015 train_loss = 1.19762 val_roc_score = 0.87802 average_precision_score = 0.86693 time= 0.80540 Epoch: 0016 train_loss = 1.15079 val_roc_score = 0.87812 average_precision_score = 0.86698 time= 0.80784 Epoch: 0017 train_loss = 1.09600 val_roc_score = 0.87802 average_precision_score = 0.86688 time= 0.79920 Epoch: 0018 train_loss = 1.05011 val_roc_score = 0.87820 average_precision_score = 0.86711 time= 0.80777 Epoch: 0019 train_loss = 1.00610 val_roc_score = 0.87840 average_precision_score = 0.86714 time= 0.80412 Epoch: 0020 train_loss = 0.95014 val_roc_score = 0.87838 average_precision_score = 0.86713 time= 0.80210 test roc score: 0.8814614254330005 test ap score: 0.8708329314774368
(2). forecast
from src.vgae.predict import Predict predict = Predict() predict.load_model_adj('config_cfg') # The original graph adjacency matrix and the hidden embedding adjacency matrix after model coding and inner product decoding will be returned. The two matrices can be compared to obtain link prediction adj_orig, adj_rec = predict.predict()
2.GCNModelARGA(src/arga)
(1). train
from src.arga.train import Train train = Train() train.train_model('config.cfg')
Epoch: 0001 train_loss = 2.08252 val_roc_score = 0.75422 average_precision_score = 0.66179 time= 0.80230 Epoch: 0002 train_loss = 2.03940 val_roc_score = 0.86953 average_precision_score = 0.85636 time= 0.79571 Epoch: 0003 train_loss = 2.00348 val_roc_score = 0.87872 average_precision_score = 0.86847 time= 0.79245 Epoch: 0004 train_loss = 1.97120 val_roc_score = 0.87997 average_precision_score = 0.86995 time= 0.79640 Epoch: 0005 train_loss = 1.93477 val_roc_score = 0.88017 average_precision_score = 0.87027 time= 0.79548 Epoch: 0006 train_loss = 1.89215 val_roc_score = 0.88046 average_precision_score = 0.87038 time= 0.79972 Epoch: 0007 train_loss = 1.84537 val_roc_score = 0.88072 average_precision_score = 0.87058 time= 0.79561 Epoch: 0008 train_loss = 1.78754 val_roc_score = 0.88063 average_precision_score = 0.87049 time= 0.79802 Epoch: 0009 train_loss = 1.72469 val_roc_score = 0.88053 average_precision_score = 0.87043 time= 0.79486 Epoch: 0010 train_loss = 1.65402 val_roc_score = 0.88063 average_precision_score = 0.87049 time= 0.79423 Epoch: 0011 train_loss = 1.57884 val_roc_score = 0.88052 average_precision_score = 0.87045 time= 0.79348 Epoch: 0012 train_loss = 1.49870 val_roc_score = 0.88049 average_precision_score = 0.87046 time= 0.79649 Epoch: 0013 train_loss = 1.42083 val_roc_score = 0.88056 average_precision_score = 0.87046 time= 0.79063 Epoch: 0014 train_loss = 1.34764 val_roc_score = 0.88060 average_precision_score = 0.87056 time= 0.79889 Epoch: 0015 train_loss = 1.27635 val_roc_score = 0.88038 average_precision_score = 0.87043 time= 0.79485 Epoch: 0016 train_loss = 1.20521 val_roc_score = 0.88050 average_precision_score = 0.87058 time= 0.79927 Epoch: 0017 train_loss = 1.13763 val_roc_score = 0.88035 average_precision_score = 0.87045 time= 0.79072 Epoch: 0018 train_loss = 1.07326 val_roc_score = 0.88035 average_precision_score = 0.87049 time= 0.79284 Epoch: 0019 train_loss = 1.01548 val_roc_score = 0.88023 average_precision_score = 0.87044 time= 0.78869 Epoch: 0020 train_loss = 0.96069 val_roc_score = 0.88014 average_precision_score = 0.87037 time= 0.79441 test roc score: 0.8798092171308727 test ap score: 0.8700487009596252
(2). forecast
from src.arga.predict import Predict predict = Predict() predict.load_model_adj('config_cfg') # The original graph adjacency matrix and the hidden embedding adjacency matrix after model coding and inner product decoding will be returned. The two matrices can be compared to obtain link prediction adj_orig, adj_rec = predict.predict()
3.GATModelVAE(src/graph_att_gae)
(1). train
from src.graph_att_gae.train import Train train = Train() train.train_model('config.cfg')
Epoch: 0001 train_loss = 1.83611 val_roc_score = 0.73571 average_precision_score = 0.62940 time= 0.81406 Epoch: 0002 train_loss = 1.83237 val_roc_score = 0.87094 average_precision_score = 0.85831 time= 0.81499 Epoch: 0003 train_loss = 1.82761 val_roc_score = 0.87429 average_precision_score = 0.86431 time= 0.81297 Epoch: 0004 train_loss = 1.78672 val_roc_score = 0.87509 average_precision_score = 0.86525 time= 0.80870 Epoch: 0005 train_loss = 1.76815 val_roc_score = 0.87523 average_precision_score = 0.86550 time= 0.81497 Epoch: 0006 train_loss = 1.72495 val_roc_score = 0.87523 average_precision_score = 0.86551 time= 0.81070 Epoch: 0007 train_loss = 1.69047 val_roc_score = 0.87593 average_precision_score = 0.86601 time= 0.80948 Epoch: 0008 train_loss = 1.63153 val_roc_score = 0.87573 average_precision_score = 0.86593 time= 0.80709 Epoch: 0009 train_loss = 1.57143 val_roc_score = 0.87551 average_precision_score = 0.86580 time= 0.80653 Epoch: 0010 train_loss = 1.50240 val_roc_score = 0.87587 average_precision_score = 0.86594 time= 0.81233 Epoch: 0011 train_loss = 1.44139 val_roc_score = 0.87567 average_precision_score = 0.86589 time= 0.80861 Epoch: 0012 train_loss = 1.37266 val_roc_score = 0.87557 average_precision_score = 0.86571 time= 0.80932 Epoch: 0013 train_loss = 1.32811 val_roc_score = 0.87578 average_precision_score = 0.86597 time= 0.80686 Epoch: 0014 train_loss = 1.30064 val_roc_score = 0.87607 average_precision_score = 0.86603 time= 0.80962 Epoch: 0015 train_loss = 1.25788 val_roc_score = 0.87592 average_precision_score = 0.86611 time= 0.80796 Epoch: 0016 train_loss = 1.23810 val_roc_score = 0.87607 average_precision_score = 0.86617 time= 0.80750 Epoch: 0017 train_loss = 1.18570 val_roc_score = 0.87594 average_precision_score = 0.86613 time= 0.80911 Epoch: 0018 train_loss = 1.14961 val_roc_score = 0.87607 average_precision_score = 0.86626 time= 0.81035 Epoch: 0019 train_loss = 1.10372 val_roc_score = 0.87593 average_precision_score = 0.86598 time= 0.81094 Epoch: 0020 train_loss = 1.05262 val_roc_score = 0.87605 average_precision_score = 0.86613 time= 0.81442 test roc score: 0.8758194438300309 test ap score: 0.8629482273490456
(2). forecast
from src.graph_att_gae.predict import Predict predict = Predict() predict.load_model_adj('config_cfg') # The original graph adjacency matrix and the hidden embedding adjacency matrix after model coding and inner product decoding will be returned. The two matrices can be compared to obtain link prediction adj_orig, adj_rec = predict.predict()
4.GATModelGAN(src/graph_att_gan)
(1). train
from src.graph_att_gan.train import Train train = Train() train.train_model('config.cfg')
Epoch: 0001 train_loss = 3.24637 val_roc_score = 0.77403 average_precision_score = 0.68203 time= 0.81267 Epoch: 0002 train_loss = 3.21157 val_roc_score = 0.87269 average_precision_score = 0.86088 time= 0.81181 Epoch: 0003 train_loss = 3.15047 val_roc_score = 0.87391 average_precision_score = 0.86203 time= 0.81182 Epoch: 0004 train_loss = 3.08302 val_roc_score = 0.87457 average_precision_score = 0.86271 time= 0.81055 Epoch: 0005 train_loss = 3.03024 val_roc_score = 0.87410 average_precision_score = 0.86226 time= 0.81125 Epoch: 0006 train_loss = 2.95011 val_roc_score = 0.87450 average_precision_score = 0.86264 time= 0.81162 Epoch: 0007 train_loss = 2.82191 val_roc_score = 0.87460 average_precision_score = 0.86275 time= 0.81088 Epoch: 0008 train_loss = 2.73079 val_roc_score = 0.87442 average_precision_score = 0.86256 time= 0.80648 Epoch: 0009 train_loss = 2.61711 val_roc_score = 0.87454 average_precision_score = 0.86268 time= 0.81021 Epoch: 0010 train_loss = 2.50720 val_roc_score = 0.87480 average_precision_score = 0.86288 time= 0.80921 Epoch: 0011 train_loss = 2.42761 val_roc_score = 0.87506 average_precision_score = 0.86298 time= 0.81137 Epoch: 0012 train_loss = 2.36874 val_roc_score = 0.87497 average_precision_score = 0.86282 time= 0.81466 Epoch: 0013 train_loss = 2.29911 val_roc_score = 0.87504 average_precision_score = 0.86291 time= 0.81193 Epoch: 0014 train_loss = 2.21190 val_roc_score = 0.87526 average_precision_score = 0.86297 time= 0.80965 Epoch: 0015 train_loss = 2.12611 val_roc_score = 0.87511 average_precision_score = 0.86290 time= 0.81013 Epoch: 0016 train_loss = 2.03527 val_roc_score = 0.87528 average_precision_score = 0.86314 time= 0.81365 Epoch: 0017 train_loss = 1.96965 val_roc_score = 0.87524 average_precision_score = 0.86309 time= 0.81125 Epoch: 0018 train_loss = 1.90381 val_roc_score = 0.87515 average_precision_score = 0.86312 time= 0.80971 Epoch: 0019 train_loss = 1.85955 val_roc_score = 0.87487 average_precision_score = 0.86288 time= 0.80996 Epoch: 0020 train_loss = 1.81664 val_roc_score = 0.87483 average_precision_score = 0.86293 time= 0.81270 test roc score: 0.8826745834179653 test ap score: 0.8715261230395998
(2). forecast
from src.graph_att_gan.predict import Predict predict = Predict() predict.load_model_adj('config_cfg') # The original graph adjacency matrix and the hidden embedding adjacency matrix after model coding and inner product decoding will be returned. The two matrices can be compared to obtain link prediction adj_orig, adj_rec = predict.predict()
5.NHGATModelVAE(src/graph_nheads_att_gae)
(1). train
from src.graph_nheads_att_gae.train import Train train = Train() train.train_model('config.cfg')
Epoch: 0001 train_loss = 1.85570 val_roc_score = 0.80750 average_precision_score = 0.72917 time= 0.84645 Epoch: 0002 train_loss = 1.78607 val_roc_score = 0.88103 average_precision_score = 0.87114 time= 0.84186 Epoch: 0003 train_loss = 1.68021 val_roc_score = 0.88117 average_precision_score = 0.87144 time= 0.84135 Epoch: 0004 train_loss = 1.52555 val_roc_score = 0.88115 average_precision_score = 0.87141 time= 0.84212 Epoch: 0005 train_loss = 1.38254 val_roc_score = 0.88070 average_precision_score = 0.87098 time= 0.83917 Epoch: 0006 train_loss = 1.40003 val_roc_score = 0.88106 average_precision_score = 0.87134 time= 0.84185 Epoch: 0007 train_loss = 1.31239 val_roc_score = 0.88081 average_precision_score = 0.87110 time= 0.83766 Epoch: 0008 train_loss = 1.17827 val_roc_score = 0.88102 average_precision_score = 0.87134 time= 0.84063 Epoch: 0009 train_loss = 1.08710 val_roc_score = 0.88086 average_precision_score = 0.87126 time= 0.84173 Epoch: 0010 train_loss = 1.01816 val_roc_score = 0.88136 average_precision_score = 0.87162 time= 0.84121 Epoch: 0011 train_loss = 0.95128 val_roc_score = 0.88128 average_precision_score = 0.87133 time= 0.84128 Epoch: 0012 train_loss = 0.87212 val_roc_score = 0.88127 average_precision_score = 0.87142 time= 0.84218 Epoch: 0013 train_loss = 0.80497 val_roc_score = 0.88134 average_precision_score = 0.87154 time= 0.84077 Epoch: 0014 train_loss = 0.75538 val_roc_score = 0.88088 average_precision_score = 0.87120 time= 0.83701 Epoch: 0015 train_loss = 0.70903 val_roc_score = 0.88063 average_precision_score = 0.87073 time= 0.83698 Epoch: 0016 train_loss = 0.68525 val_roc_score = 0.88035 average_precision_score = 0.87055 time= 0.83837 Epoch: 0017 train_loss = 0.66079 val_roc_score = 0.87995 average_precision_score = 0.87053 time= 0.83806 Epoch: 0018 train_loss = 0.65187 val_roc_score = 0.87924 average_precision_score = 0.86958 time= 0.84210 Epoch: 0019 train_loss = 0.64572 val_roc_score = 0.87929 average_precision_score = 0.86995 time= 0.84069 Epoch: 0020 train_loss = 0.64103 val_roc_score = 0.87951 average_precision_score = 0.87026 time= 0.83967 test roc score: 0.877033361471422 test ap score: 0.867286248500891
(2). forecast
from src.graph_nheads_att_gae.predict import Predict predict = Predict() predict.load_model_adj('config_cfg') # The original graph adjacency matrix and the hidden embedding adjacency matrix after model coding and inner product decoding will be returned. The two matrices can be compared to obtain link prediction adj_orig, adj_rec = predict.predict()
6.NHGATModelGAN(src/graph_nheads_att_gan)
(1). train
from src.graph_nheads_att_gan.train import Train train = Train() train.train_model('config.cfg')
Epoch: 0001 train_loss = 3.24091 val_roc_score = 0.77050 average_precision_score = 0.66992 time= 0.85475 Epoch: 0002 train_loss = 3.18022 val_roc_score = 0.87671 average_precision_score = 0.86657 time= 0.84643 Epoch: 0003 train_loss = 3.09047 val_roc_score = 0.87715 average_precision_score = 0.86704 time= 0.84354 Epoch: 0004 train_loss = 2.95696 val_roc_score = 0.87695 average_precision_score = 0.86698 time= 0.84279 Epoch: 0005 train_loss = 2.87052 val_roc_score = 0.87747 average_precision_score = 0.86741 time= 0.84714 Epoch: 0006 train_loss = 2.88739 val_roc_score = 0.87742 average_precision_score = 0.86727 time= 0.84777 Epoch: 0007 train_loss = 2.78251 val_roc_score = 0.87757 average_precision_score = 0.86748 time= 0.84134 Epoch: 0008 train_loss = 2.65458 val_roc_score = 0.87766 average_precision_score = 0.86745 time= 0.84429 Epoch: 0009 train_loss = 2.60484 val_roc_score = 0.87798 average_precision_score = 0.86780 time= 0.84680 Epoch: 0010 train_loss = 2.56642 val_roc_score = 0.87806 average_precision_score = 0.86766 time= 0.84952 Epoch: 0011 train_loss = 2.49832 val_roc_score = 0.87826 average_precision_score = 0.86771 time= 0.84535 Epoch: 0012 train_loss = 2.38511 val_roc_score = 0.87799 average_precision_score = 0.86763 time= 0.84903 Epoch: 0013 train_loss = 2.28920 val_roc_score = 0.87781 average_precision_score = 0.86762 time= 0.84161 Epoch: 0014 train_loss = 2.23039 val_roc_score = 0.87791 average_precision_score = 0.86761 time= 0.84422 Epoch: 0015 train_loss = 2.14044 val_roc_score = 0.87782 average_precision_score = 0.86750 time= 0.84063 Epoch: 0016 train_loss = 2.05134 val_roc_score = 0.87774 average_precision_score = 0.86754 time= 0.84043 Epoch: 0017 train_loss = 1.95402 val_roc_score = 0.87745 average_precision_score = 0.86740 time= 0.84461 Epoch: 0018 train_loss = 1.89405 val_roc_score = 0.87714 average_precision_score = 0.86720 time= 0.84435 Epoch: 0019 train_loss = 1.83182 val_roc_score = 0.87690 average_precision_score = 0.86693 time= 0.84567 Epoch: 0020 train_loss = 1.74144 val_roc_score = 0.87683 average_precision_score = 0.86717 time= 0.84130 test roc score: 0.8767371798715641 test ap score: 0.8680650766563964
(2). forecast
from src.graph_nheads_att_gan.predict import Predict predict = Predict() predict.load_model_adj('config_cfg') # The original graph adjacency matrix and the hidden embedding adjacency matrix after model coding and inner product decoding will be returned. The two matrices can be compared to obtain link prediction adj_orig, adj_rec = predict.predict()
Dataset
Data from yeast protein interactions yeast .
The format of the data set is as follows. See data.
YLR418C YOL145C YOL145C YLR418C YLR418C YOR123C YOR123C YLR418C ...... ......
Install
- Installation: pip install GCN4LP
- Download source code:
git clone https://github.com/jiangnanboy/gcn_for_prediction_of_protein_interactions.git cd gcn_for_prediction_of_protein_interactions python setup.py install
You can complete the installation by either of the above two methods. If you don't want to install it, you can download it github source package
Cite
If you used GCN4LP in the study, please quote in the following format:
@software{GCN4LP, author = {Shi Yan}, title = {GCN4LP: gcn for prediction of protein interactions}, year = {2021}, url = {https://github.com/jiangnanboy/gcn_for_prediction_of_protein_interactions}, }
Reference
- Variational Graph Auto-Encoders
- https://github.com/zfjsail/gae-pytorch/blob/master/gae/utils.py
- https://github.com/tkipf/gae/tree/master/gae
- http://snap.stanford.edu/deepnetbio-ismb/ipynb/Graph+Convolutional+Prediction+of+Protein+Interactions+in+Yeast.html
- Adversarially Regularized Graph Autoencoder for Graph Embedding
- https://www.biyezuopin.vip
- GRAPH ATTENTION NETWORKS