In this section, we will learn how to implement a multilayer network and the general process of generating high-quality node representations and classifying high-accuracy nodes using a training graph network.
Our task is to predict nodes with unknown labels based on their attributes (which can be categorical or numerical), edge information, edge attributes (if any), and known node prediction labels.
Introduction to Cora Dataset
- The Cora dataset consists of paper s from many machine learning domains that are grouped into seven categories:Case_Based, Gene_ Algorithms, Neural_Networks, Probabilistic_Methods, Reinforcement_Learning, Rule_Learning, Theory.
- In this dataset, each paper references at least one other paper in the dataset or another, totaling 2708 papers. There were 1433 words in the final vocabulary after removing word breaks and words with less than 10 document frequencies.
The dataset contains two files:
- The.Content file contains a description of the content of the paper in the format
- <paper_ Id>: The identifier of a paper, one for each paper.
- <word_ Attributes>: is a lexical feature, 0 or 1, indicating the existence of the corresponding vocabulary.
- <class_ Label>: is the category described in this document.
- The.cites file contains a citation graph of the dataset, with each row formatted as follows
- : The referenced paper identifier.
- : The paper identifier of the reference.
References are right-to-left, for example, if there is a behavior of paper1 paper2, then the corresponding join relationship is paper2->paper1.
MLP, GCN, GAT Node Representation Learning Ability Comparison
Dead work
##Introducing datasets from torch_geometric.datasets import Planetoid from torch_geometric.transforms import NormalizeFeatures dataset = Planetoid(root='/home/**/python_file/gnn/dataset', name='Cora',transform=NormalizeFeatures()) data = dataset[0] print(data) ##Node Representation Distribution Visualizer Loading import matplotlib.pyplot as plt from sklearn.manifold import TSNE def visualize(h, color): z = TSNE(n_components=2).fit_transform(out.detach().cpu().numpy()) plt.figure(figsize=(10,10)) plt.xticks([]) plt.yticks([]) plt.scatter(z[:, 0], z[:, 1], s=70, c=color, cmap="Set2") plt.show()
Application of MLP in Graph Node Classification Task
##MLP Graph Node Classifier import torch from torch.nn import Linear import torch.nn.functional as F class MLP(torch.nn.Module): def __init__(self, hidden_channels): super(MLP, self).__init__() torch.manual_seed(12345) self.lin1 = Linear(dataset.num_features, hidden_channels) self.lin2 = Linear(hidden_channels, dataset.num_classes) def forward(self, x): x = self.lin1(x) x = x.relu() x = F.dropout(x, p=0.5, training=self.training) x = self.lin2(x) return x # Simple training MLP model = MLP(hidden_channels=16) print(model) model = MLP(hidden_channels=16) criterion = torch.nn.CrossEntropyLoss() # Define loss criterion. optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4) # Define optimizer. def train(): model.train() optimizer.zero_grad() # Clear gradients. out = model(data.x) # Perform a single forward pass. loss = criterion(out[data.train_mask], data.y[data.train_mask]) # Compute the loss solely based on the training nodes. loss.backward() # Derive gradients. optimizer.step() # Update parameters based on gradients. return loss for epoch in range(1, 201): loss = train() # print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}') ##Test results def test(): model.eval() out = model(data.x) pred = out.argmax(dim=1) # Use the class with highest probability. test_correct = pred[data.test_mask] == data.y[data.test_mask] # Check against ground-truth labels. test_acc = int(test_correct.sum()) / int(data.test_mask.sum()) # Derive ratio of correct predictions. return test_acc test_acc = test() print(f'Test Accuracy: {test_acc:.4f}')
Application of GCN in Graph Node Classification Task
##Introducing GCN from torch_geometric.nn import GCNConv class GCN(torch.nn.Module): def __init__(self, hidden_channels): super(GCN, self).__init__() torch.manual_seed(12345) self.conv1 = GCNConv(dataset.num_features, hidden_channels) self.conv2 = GCNConv(hidden_channels, dataset.num_classes) def forward(self, x, edge_index): x = self.conv1(x, edge_index) x = x.relu() x = F.dropout(x, p=0.5, training=self.training) x = self.conv2(x, edge_index) return x model = GCN(hidden_channels=16) print(model) ##Visualization of untrained data model = GCN(hidden_channels=16) model.eval() out = model(data.x, data.edge_index) visualize(out, color=data.y) ##Training GCN Classifier model = GCN(hidden_channels=16) optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4) criterion = torch.nn.CrossEntropyLoss() def train(): model.train() optimizer.zero_grad() # Clear gradients. out = model(data.x, data.edge_index) # Perform a single forward pass. loss = criterion(out[data.train_mask], data.y[data.train_mask]) # Compute the loss solely based on the training nodes. loss.backward() # Derive gradients. optimizer.step() # Update parameters based on gradients. return loss for epoch in range(1, 201): loss = train() # print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}') ##Test results def test(): model.eval() out = model(data.x, data.edge_index) pred = out.argmax(dim=1) # Use the class with highest probability. test_correct = pred[data.test_mask] == data.y[data.test_mask] # Check against ground-truth labels. test_acc = int(test_correct.sum()) / int(data.test_mask.sum()) # Derive ratio of correct predictions. return test_acc test_acc = test() print(f'Test Accuracy: {test_acc:.4f}') ##Visualization of results after training model.eval() out = model(data.x, data.edge_index) visualize(out, color=data.y)
Application of GAT in Graph Node Classification Task
##Introducing GAT import torch from torch.nn import Linear import torch.nn.functional as F from torch_geometric.nn import GATConv class GAT(torch.nn.Module): def __init__(self, hidden_channels): super(GAT, self).__init__() torch.manual_seed(12345) self.conv1 = GATConv(dataset.num_features, hidden_channels) self.conv2 = GATConv(hidden_channels, dataset.num_classes) def forward(self, x, edge_index): x = self.conv1(x, edge_index) x = x.relu() x = F.dropout(x, p=0.5, training=self.training) x = self.conv2(x, edge_index) return x model = GAT(hidden_channels=16) print(model) ##Visualization of untrained data model = GAT(hidden_channels=16) model.eval() out = model(data.x, data.edge_index) visualize(out, color=data.y) ##Train GAT Classifier model = GAT(hidden_channels=16) optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4) criterion = torch.nn.CrossEntropyLoss() def train(): model.train() optimizer.zero_grad() # Clear gradients. out = model(data.x, data.edge_index) # Perform a single forward pass. loss = criterion(out[data.train_mask], data.y[data.train_mask]) # Compute the loss solely based on the training nodes. loss.backward() # Derive gradients. optimizer.step() # Update parameters based on gradients. return loss for epoch in range(1, 201): loss = train() # print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}') ##def test(): model.eval() out = model(data.x, data.edge_index) pred = out.argmax(dim=1) # Use the class with highest probability. test_correct = pred[data.test_mask] == data.y[data.test_mask] # Check against ground-truth labels. test_acc = int(test_correct.sum()) / int(data.test_mask.sum()) # Derive ratio of correct predictions. return test_acc test_acc = test() print(f'Test Accuracy: {test_acc:.4f}') ##Data visualization after training model.eval() out = model(data.x, data.edge_index) visualize(out, color=data.y)
Comparative analysis of results
- Measurement accuracy (unadjusted): ACC(GCN)>ACC(GAT)>ACC(MLP)
- Reason: In the learning of node representation, the MLP node classifier only considers the node's own attributes, ignoring the connection between nodes, and its result is the worst. GCN and GAT node classifiers take into account both the attributes of nodes themselves and those of neighboring nodes, and their results are better than MLP node classifiers. It can be seen that the information of neighbor nodes is important to the task of node classification.
- The difference between GCN and GAT is that the normalization methods in the aggregation process of neighbor node information are different:
- The former calculates the normalization factor based on the degree of the center node and the neighbor node, while the latter calculates the normalization factor based on the similarity between the center node and the neighbor node.
- The normalization method of the former depends on the topological structure of the graph. Different nodes have different degrees of themselves and their neighbors, which may affect the generalization ability in some applications.
- The latter depends on the similarity between the center node and the neighbor node. The similarity is trained, so it is not affected by the topological structure of the graph and will have better generalization performance in different tasks.
Reference material:
https://github.com/datawhalechina/team-learning-nlp/tree
https://zhuanlan.zhihu.com/p/78452993