Extracting text relations with ltp and creating knowledge map (based on neo4j)

Posted by naveendk.55 on Wed, 09 Feb 2022 21:09:39 +0100

Blogger in the last article Extracting text relations and creating knowledge map with ltp (based on neo4j) (I) In this paper, the single sentence is analyzed with LTP, the semantic dependency is extracted, and the map is created on neo4j website with python. This chapter is an extension of the previous article. The overall code is similar. This paper can create a knowledge map of multiple sentences. This time, it creates not semantic dependency, but syntactic relationship (subject predicate relationship, verb object relationship, etc.).

You can refer to the ltp Official documents and ltp appendix Read the text.

Extract text relationships using ltp:

This article is just a simple demonstration. The sentences analyzed are:

He asked Tom to get his coat. Tom is ill. He went to the hospital.

You can also replace it at will.

from ltp import LTP


def ltp_data():
    """Processing sentences into Semantic Dependency Graphs"""

    ltp = LTP()
    # Clause
    sents = ltp.sent_split(["He asked Tom to get his coat.Tom is ill. He went to the hospital."])
    # participle
    seg, hidden = ltp.seg(sents)
    # Part of speech tagging
    pos = ltp.pos(hidden)
    # Part of speech tagging
    ner = ltp.ner(hidden)
    # Semantic Role Labeling 
    srl = ltp.srl(hidden)
    # dependency parsing 
    dep = ltp.dep(hidden)
    # Semantic dependency analysis (Figure)
    sdp = ltp.sdp(hidden, mode='graph')

    return dep, pos, seg

Here, let's take a look at the returned results:

if __name__ == '__main__':
    ds, pos, seg = ltp_data()
    print("Semantic dependencies:{k}".format(k = ds))
    print("label:{k}".format(k = pos))
    print("Clause:{k}".format(k = seg))

out:

Semantic dependencies: [(1, 2, 'SBV'), (2, 0, 'HED'), (3, 2, 'DBL'), (4, 5, 'ADV'), (5, 2, 'VOB'), (6, 5, 'VOB'), (7, 2, 'WP')], [(1, 2, 'SBV'), (2, 0, 'HED'), (3, 2, 'RAD'), (4, 2, 'WP')], [(1, 2, 'SBV'), (2, 0, 'HED'), (3, 2, 'RAD'), (4, 2, 'VOB'), (5, 2, 'WP')]]
Labels: ['r', 'v', 'nh', 'v', 'v', 'n', 'wp', 'nh', 'v', 'v', 'u', 'wp', 'r', 'v', 'u', 'n', 'wp']]
Clause: [['he', 'call', 'Tom', 'go', 'take', 'coat', '.'], ['Tom', 'ill', 'gone', 'hospital']]

Refer to the specific meaning of labels and relationships ltp appendix.

Extract nodes and relationships:

Sort out the results returned in the previous step and extract nodes and relationships from them.

Extract node:

def node_extraction(seg, pos):
    """Extract the node name and node type from the semantic dependency graph"""
    for i in range(len(seg)):
        seg[i] = [str(i) for i in seg[i]]
        pos[i] = [str(i) for i in pos[i]]

    return seg, pos

The created node is needed to extract the relationship, so the parameter nodes is used, which is generated from the node function created later.

Extraction relation

def relation_extraction(ds,nodes):
    pass
    """
    Extract the relationship between nodes, integrate nodes and information into triples and store them in the list.
    (node1,node2,relation)
    """
    rel = []
    for ds_sentence, nodes_sentence in zip(ds, nodes):
        rel_sentence = []
        for ds_word, nodes_word in zip(ds_sentence, nodes_sentence):
            # Extract nodes and relationships according to the index
            index1 = int(ds_word[0]) - 1
            index2 = int(ds_word[1]) - 1
            node1 = nodes_sentence[index1]
            node2 = nodes_sentence[index2]
            relation = ds_word[2]

            # Add nodes and relationships to 3 tuples
            rel_word = []
            rel_word.append(node1)
            rel_word.append(node2)
            rel_word.append(relation)

            # Integrate 3 tuples into sentences
            rel_sentence.append(rel_word)

            # Integrate single sentences into the list
        rel.append(rel_sentence)

    return rel

Create nodes and relationships:

This step is to create a knowledge map. You need to go to the neo4j connection first. When establishing the connection, the first parameter is the website generated when opening neo4j with cmd( http://localhost:7474 ), the second parameter is user name and the third parameter is password.

from py2neo import Node, Graph, Relationship
from ltp_data import ltp_data
# You can read the following documents first: https://py2neo.org/v4/index.htm

class DataToNeo4j(object):
    """take excel Data storage in neo4j"""

    def __init__(self):
        """Establish connection"""
        link = Graph("your localhost", username="your username", password="your password")
        self.graph = link
        # self.graph = NodeMatcher(link)
        self.graph.delete_all()

        """
        node3 = Node('animal' , name = 'cat')
        node4 = Node('animal' , name = 'dog')  
        node2 = Node('Person' , name = 'Alice')
        node1 = Node('Person' , name = 'Bob')  
        r1 = Relationship(node2 , 'know' , node1)    
        r2 = Relationship(node1 , 'know' , node3) 
        r3 = Relationship(node2 , 'has' , node3) 
        r4 = Relationship(node4 , 'has' , node2)    
        self.graph.create(node1)
        self.graph.create(node2)
        self.graph.create(node3)
        self.graph.create(node4)
        self.graph.create(r1)
        self.graph.create(r2)
        self.graph.create(r3)
        self.graph.create(r4)
        """

    def create_node(self, name_node, type_node):
        """Establish node"""
        nodes = []
        for name_sentence, type_sentence in zip(name_node, type_node):
            nodes_sentence = []
            for name_word, type_word in zip(name_sentence, type_sentence):
                # Create node
                node = Node(type_word, name = name_word)
                self.graph.create(node)
                # Save it
                nodes_sentence.append(node)
            nodes.append(nodes_sentence)

        print('Node established successfully')
        return nodes


    def create_relation(self, rel):
        """Establish contact"""
        for sentence in rel:
            for word in sentence:
                try:
                    # The relationship should be converted to string format
                    r = Relationship(word[0], str(word[2]), word[1])
                    self.graph.create(r)
                except AttributeError as e:
                    print(e)

        print('Successful relationship establishment')

test run

if __name__ == '__main__':
    ds, pos, seg = ltp_data()
    create_data = DataToNeo4j()

    # Establish node
    node_name, node_type = node_extraction(seg, pos)
    nodes = create_data.create_node(node_name, node_type)
    print("Node of the first sentence:\n{k}".format(k = nodes[0]))

    # Establish contact
    rel = relation_extraction(ds, nodes)
    create_data.create_relation(rel)
    print("Relationship of the first sentence:\n{k}".format(k = rel[0]))

out:

Node established successfully
Node of the first sentence:
[Node('r ', name =' he '), Node('v', name = 'call'), Node('nh ', name =' Tom '), Node('v', name = 'go'), Node('v ', name =' take '), Node('n', name = 'coat'), Node('wp ', name ='. ')]
Successful relationship establishment
Relationship of the first sentence:
[[Node('r ', name =' he '), Node('v', name = 'call'), 'SBV'), [Node('v ', name =' call '), Node('wp', name = '.'), 'HED'), [Node('nh ', name =' Tom '), Node('v', name = 'call'), 'DBL'), [Node('v ', name =' go '), Node('v', name = 'take'), 'ADV'], [Node('v ', name =' take '), Node('v', name = 'call'),'VOB '], [Node('n', name = 'coat'), Node('v ', name =' take '),'VOB'], [Node('wp ', name ='. '), Node('v', name = 'call'),'wp ']]

effect


To put it another way:

A group of animals on the farm successfully carried out a revolution, drove their human owners out of the farm and established an equal animal society. However, animal leaders, those clever pigs, finally usurped the fruits of the revolution and became more authoritarian and totalitarian rulers than human owners.


Note the red arrow here. neo4j the default number of nodes is relatively small. When there are many established nodes, some of them may not be displayed, making you mistakenly think they have not been created. Just increase the limit a little.

All codes:

ltp_data.py

from ltp import LTP

def ltp_data():
    """Processing sentences into Semantic Dependency Graphs"""

    ltp = LTP()
    # Clause
    sents = ltp.sent_split(["He asked Tom to get his coat.Tom is ill. He went to the hospital."])
    # participle
    seg, hidden = ltp.seg(sents)
    # Part of speech tagging
    pos = ltp.pos(hidden)
    # Part of speech tagging
    ner = ltp.ner(hidden)
    # Semantic Role Labeling 
    srl = ltp.srl(hidden)
    # dependency parsing 
    dep = ltp.dep(hidden)
    # Semantic dependency analysis (Figure)
    sdp = ltp.sdp(hidden, mode='graph')

    return dep, pos, seg


if __name__ == '__main__':
    ds, pos, seg = ltp_data()
    print("Semantic dependencies:{k}".format(k = ds))
    print("label:{k}".format(k = pos))
    print("Clause:{k}".format(k = seg))

neo4j.py

# -*- coding: utf-8 -*-
from py2neo import Node, Graph, Relationship
from ltp_data import ltp_data
# You can read the following documents first: https://py2neo.org/v4/index.htm

class DataToNeo4j(object):
    """take excel Data storage in neo4j"""

    def __init__(self):
        """Establish connection"""
        link = Graph("your localhost", username="your username", password="your password")
        self.graph = link
        # self.graph = NodeMatcher(link)
        self.graph.delete_all()

        """
        node3 = Node('animal' , name = 'cat')
        node4 = Node('animal' , name = 'dog')  
        node2 = Node('Person' , name = 'Alice')
        node1 = Node('Person' , name = 'Bob')  
        r1 = Relationship(node2 , 'know' , node1)    
        r2 = Relationship(node1 , 'know' , node3) 
        r3 = Relationship(node2 , 'has' , node3) 
        r4 = Relationship(node4 , 'has' , node2)    
        self.graph.create(node1)
        self.graph.create(node2)
        self.graph.create(node3)
        self.graph.create(node4)
        self.graph.create(r1)
        self.graph.create(r2)
        self.graph.create(r3)
        self.graph.create(r4)
        """

    def create_node(self, name_node, type_node):
        """Establish node"""
        nodes = []
        for name_sentence, type_sentence in zip(name_node, type_node):
            nodes_sentence = []
            for name_word, type_word in zip(name_sentence, type_sentence):
                # Create node
                node = Node(type_word, name = name_word)
                self.graph.create(node)
                # Save it
                nodes_sentence.append(node)
            nodes.append(nodes_sentence)

        print('Node established successfully')
        return nodes


    def create_relation(self, rel):
        """Establish contact"""
        for sentence in rel:
            for word in sentence:
                try:
                    # The relationship should be converted to string format
                    r = Relationship(word[0], str(word[2]), word[1])
                    self.graph.create(r)
                except AttributeError as e:
                    print(e)

        print('Successful relationship establishment')


def node_extraction(seg, pos):
    """Extract the node name and node type from the semantic dependency graph"""
    for i in range(len(seg)):
        seg[i] = [str(i) for i in seg[i]]
        pos[i] = [str(i) for i in pos[i]]

    return seg, pos


def relation_extraction(ds,nodes):
    pass
    """
    Extract the relationship between nodes, integrate nodes and information into triples and store them in the list.
    (node1,node2,relation)
    """
    rel = []
    for ds_sentence, nodes_sentence in zip(ds, nodes):
        rel_sentence = []
        for ds_word, nodes_word in zip(ds_sentence, nodes_sentence):
            # Extract nodes and relationships according to the index
            index1 = int(ds_word[0]) - 1
            index2 = int(ds_word[1]) - 1
            node1 = nodes_sentence[index1]
            node2 = nodes_sentence[index2]
            relation = ds_word[2]

            # Add nodes and relationships to 3 tuples
            rel_word = []
            rel_word.append(node1)
            rel_word.append(node2)
            rel_word.append(relation)

            # Integrate 3 tuples into sentences
            rel_sentence.append(rel_word)

            # Integrate single sentences into the list
        rel.append(rel_sentence)

    return rel


if __name__ == '__main__':
    ds, pos, seg = ltp_data()
    create_data = DataToNeo4j()

    # Establish node
    node_name, node_type = node_extraction(seg, pos)
    nodes = create_data.create_node(node_name, node_type)
    print("Node of the first sentence:\n{k}".format(k = nodes[0]))

    # Establish contact
    rel = relation_extraction(ds, nodes)
    create_data.create_relation(rel)
    print("Relationship of the first sentence:\n{k}".format(k = rel[0]))

Topics: Python Machine Learning NLP