Blogger in the last article Extracting text relations and creating knowledge map with ltp (based on neo4j) (I) In this paper, the single sentence is analyzed with LTP, the semantic dependency is extracted, and the map is created on neo4j website with python. This chapter is an extension of the previous article. The overall code is similar. This paper can create a knowledge map of multiple sentences. This time, it creates not semantic dependency, but syntactic relationship (subject predicate relationship, verb object relationship, etc.).
You can refer to the ltp Official documents and ltp appendix Read the text.
Extract text relationships using ltp:
This article is just a simple demonstration. The sentences analyzed are:
He asked Tom to get his coat. Tom is ill. He went to the hospital.
You can also replace it at will.
from ltp import LTP def ltp_data(): """Processing sentences into Semantic Dependency Graphs""" ltp = LTP() # Clause sents = ltp.sent_split(["He asked Tom to get his coat.Tom is ill. He went to the hospital."]) # participle seg, hidden = ltp.seg(sents) # Part of speech tagging pos = ltp.pos(hidden) # Part of speech tagging ner = ltp.ner(hidden) # Semantic Role Labeling srl = ltp.srl(hidden) # dependency parsing dep = ltp.dep(hidden) # Semantic dependency analysis (Figure) sdp = ltp.sdp(hidden, mode='graph') return dep, pos, seg
Here, let's take a look at the returned results:
if __name__ == '__main__': ds, pos, seg = ltp_data() print("Semantic dependencies:{k}".format(k = ds)) print("label:{k}".format(k = pos)) print("Clause:{k}".format(k = seg))
out:
Semantic dependencies: [(1, 2, 'SBV'), (2, 0, 'HED'), (3, 2, 'DBL'), (4, 5, 'ADV'), (5, 2, 'VOB'), (6, 5, 'VOB'), (7, 2, 'WP')], [(1, 2, 'SBV'), (2, 0, 'HED'), (3, 2, 'RAD'), (4, 2, 'WP')], [(1, 2, 'SBV'), (2, 0, 'HED'), (3, 2, 'RAD'), (4, 2, 'VOB'), (5, 2, 'WP')]]
Labels: ['r', 'v', 'nh', 'v', 'v', 'n', 'wp', 'nh', 'v', 'v', 'u', 'wp', 'r', 'v', 'u', 'n', 'wp']]
Clause: [['he', 'call', 'Tom', 'go', 'take', 'coat', '.'], ['Tom', 'ill', 'gone', 'hospital']]
Refer to the specific meaning of labels and relationships ltp appendix.
Extract nodes and relationships:
Sort out the results returned in the previous step and extract nodes and relationships from them.
Extract node:
def node_extraction(seg, pos): """Extract the node name and node type from the semantic dependency graph""" for i in range(len(seg)): seg[i] = [str(i) for i in seg[i]] pos[i] = [str(i) for i in pos[i]] return seg, pos
The created node is needed to extract the relationship, so the parameter nodes is used, which is generated from the node function created later.
Extraction relation
def relation_extraction(ds,nodes): pass """ Extract the relationship between nodes, integrate nodes and information into triples and store them in the list. (node1,node2,relation) """ rel = [] for ds_sentence, nodes_sentence in zip(ds, nodes): rel_sentence = [] for ds_word, nodes_word in zip(ds_sentence, nodes_sentence): # Extract nodes and relationships according to the index index1 = int(ds_word[0]) - 1 index2 = int(ds_word[1]) - 1 node1 = nodes_sentence[index1] node2 = nodes_sentence[index2] relation = ds_word[2] # Add nodes and relationships to 3 tuples rel_word = [] rel_word.append(node1) rel_word.append(node2) rel_word.append(relation) # Integrate 3 tuples into sentences rel_sentence.append(rel_word) # Integrate single sentences into the list rel.append(rel_sentence) return rel
Create nodes and relationships:
This step is to create a knowledge map. You need to go to the neo4j connection first. When establishing the connection, the first parameter is the website generated when opening neo4j with cmd( http://localhost:7474 ), the second parameter is user name and the third parameter is password.
from py2neo import Node, Graph, Relationship from ltp_data import ltp_data # You can read the following documents first: https://py2neo.org/v4/index.htm class DataToNeo4j(object): """take excel Data storage in neo4j""" def __init__(self): """Establish connection""" link = Graph("your localhost", username="your username", password="your password") self.graph = link # self.graph = NodeMatcher(link) self.graph.delete_all() """ node3 = Node('animal' , name = 'cat') node4 = Node('animal' , name = 'dog') node2 = Node('Person' , name = 'Alice') node1 = Node('Person' , name = 'Bob') r1 = Relationship(node2 , 'know' , node1) r2 = Relationship(node1 , 'know' , node3) r3 = Relationship(node2 , 'has' , node3) r4 = Relationship(node4 , 'has' , node2) self.graph.create(node1) self.graph.create(node2) self.graph.create(node3) self.graph.create(node4) self.graph.create(r1) self.graph.create(r2) self.graph.create(r3) self.graph.create(r4) """ def create_node(self, name_node, type_node): """Establish node""" nodes = [] for name_sentence, type_sentence in zip(name_node, type_node): nodes_sentence = [] for name_word, type_word in zip(name_sentence, type_sentence): # Create node node = Node(type_word, name = name_word) self.graph.create(node) # Save it nodes_sentence.append(node) nodes.append(nodes_sentence) print('Node established successfully') return nodes def create_relation(self, rel): """Establish contact""" for sentence in rel: for word in sentence: try: # The relationship should be converted to string format r = Relationship(word[0], str(word[2]), word[1]) self.graph.create(r) except AttributeError as e: print(e) print('Successful relationship establishment')
test run
if __name__ == '__main__': ds, pos, seg = ltp_data() create_data = DataToNeo4j() # Establish node node_name, node_type = node_extraction(seg, pos) nodes = create_data.create_node(node_name, node_type) print("Node of the first sentence:\n{k}".format(k = nodes[0])) # Establish contact rel = relation_extraction(ds, nodes) create_data.create_relation(rel) print("Relationship of the first sentence:\n{k}".format(k = rel[0]))
out:
Node established successfully
Node of the first sentence:
[Node('r ', name =' he '), Node('v', name = 'call'), Node('nh ', name =' Tom '), Node('v', name = 'go'), Node('v ', name =' take '), Node('n', name = 'coat'), Node('wp ', name ='. ')]
Successful relationship establishment
Relationship of the first sentence:
[[Node('r ', name =' he '), Node('v', name = 'call'), 'SBV'), [Node('v ', name =' call '), Node('wp', name = '.'), 'HED'), [Node('nh ', name =' Tom '), Node('v', name = 'call'), 'DBL'), [Node('v ', name =' go '), Node('v', name = 'take'), 'ADV'], [Node('v ', name =' take '), Node('v', name = 'call'),'VOB '], [Node('n', name = 'coat'), Node('v ', name =' take '),'VOB'], [Node('wp ', name ='. '), Node('v', name = 'call'),'wp ']]
effect
To put it another way:
A group of animals on the farm successfully carried out a revolution, drove their human owners out of the farm and established an equal animal society. However, animal leaders, those clever pigs, finally usurped the fruits of the revolution and became more authoritarian and totalitarian rulers than human owners.
Note the red arrow here. neo4j the default number of nodes is relatively small. When there are many established nodes, some of them may not be displayed, making you mistakenly think they have not been created. Just increase the limit a little.
All codes:
ltp_data.py
from ltp import LTP def ltp_data(): """Processing sentences into Semantic Dependency Graphs""" ltp = LTP() # Clause sents = ltp.sent_split(["He asked Tom to get his coat.Tom is ill. He went to the hospital."]) # participle seg, hidden = ltp.seg(sents) # Part of speech tagging pos = ltp.pos(hidden) # Part of speech tagging ner = ltp.ner(hidden) # Semantic Role Labeling srl = ltp.srl(hidden) # dependency parsing dep = ltp.dep(hidden) # Semantic dependency analysis (Figure) sdp = ltp.sdp(hidden, mode='graph') return dep, pos, seg if __name__ == '__main__': ds, pos, seg = ltp_data() print("Semantic dependencies:{k}".format(k = ds)) print("label:{k}".format(k = pos)) print("Clause:{k}".format(k = seg))
neo4j.py
# -*- coding: utf-8 -*- from py2neo import Node, Graph, Relationship from ltp_data import ltp_data # You can read the following documents first: https://py2neo.org/v4/index.htm class DataToNeo4j(object): """take excel Data storage in neo4j""" def __init__(self): """Establish connection""" link = Graph("your localhost", username="your username", password="your password") self.graph = link # self.graph = NodeMatcher(link) self.graph.delete_all() """ node3 = Node('animal' , name = 'cat') node4 = Node('animal' , name = 'dog') node2 = Node('Person' , name = 'Alice') node1 = Node('Person' , name = 'Bob') r1 = Relationship(node2 , 'know' , node1) r2 = Relationship(node1 , 'know' , node3) r3 = Relationship(node2 , 'has' , node3) r4 = Relationship(node4 , 'has' , node2) self.graph.create(node1) self.graph.create(node2) self.graph.create(node3) self.graph.create(node4) self.graph.create(r1) self.graph.create(r2) self.graph.create(r3) self.graph.create(r4) """ def create_node(self, name_node, type_node): """Establish node""" nodes = [] for name_sentence, type_sentence in zip(name_node, type_node): nodes_sentence = [] for name_word, type_word in zip(name_sentence, type_sentence): # Create node node = Node(type_word, name = name_word) self.graph.create(node) # Save it nodes_sentence.append(node) nodes.append(nodes_sentence) print('Node established successfully') return nodes def create_relation(self, rel): """Establish contact""" for sentence in rel: for word in sentence: try: # The relationship should be converted to string format r = Relationship(word[0], str(word[2]), word[1]) self.graph.create(r) except AttributeError as e: print(e) print('Successful relationship establishment') def node_extraction(seg, pos): """Extract the node name and node type from the semantic dependency graph""" for i in range(len(seg)): seg[i] = [str(i) for i in seg[i]] pos[i] = [str(i) for i in pos[i]] return seg, pos def relation_extraction(ds,nodes): pass """ Extract the relationship between nodes, integrate nodes and information into triples and store them in the list. (node1,node2,relation) """ rel = [] for ds_sentence, nodes_sentence in zip(ds, nodes): rel_sentence = [] for ds_word, nodes_word in zip(ds_sentence, nodes_sentence): # Extract nodes and relationships according to the index index1 = int(ds_word[0]) - 1 index2 = int(ds_word[1]) - 1 node1 = nodes_sentence[index1] node2 = nodes_sentence[index2] relation = ds_word[2] # Add nodes and relationships to 3 tuples rel_word = [] rel_word.append(node1) rel_word.append(node2) rel_word.append(relation) # Integrate 3 tuples into sentences rel_sentence.append(rel_word) # Integrate single sentences into the list rel.append(rel_sentence) return rel if __name__ == '__main__': ds, pos, seg = ltp_data() create_data = DataToNeo4j() # Establish node node_name, node_type = node_extraction(seg, pos) nodes = create_data.create_node(node_name, node_type) print("Node of the first sentence:\n{k}".format(k = nodes[0])) # Establish contact rel = relation_extraction(ds, nodes) create_data.create_relation(rel) print("Relationship of the first sentence:\n{k}".format(k = rel[0]))