[python + Neo4j] simple use of knowledge map tool

Posted by cyberRobot on Fri, 18 Feb 2022 22:17:06 +0100

[python + Neo4j] simple use of knowledge map tool

Write at the front: because the graduation project needs to use the knowledge map tool to visually display the csv form, after learning Zhihu, blog, etc., we decided to use neo4j for preliminary implementation. This article records the whole process of Xiaobai's entry, for novices' reference!

1, neo4j installation and environment configuration

This part is implemented with reference to the blogs of predecessors. A blog with a very detailed introduction is put below. I followed this one and it is available for personal testing.

Neo4j Article 1: installing neo4j in Windows Environment - happy time blog Park (cnblogs.com)

PS: I didn't operate the third part of network connection configuration, because I didn't have this requirement, and my computer network is very poor, so I can't understand it.

2, python operation neo4j

Here we need to use a library: py2neo, which can be installed directly from pip.

2.1 introduction

For the simplest introduction, I refer to the following blog. Please pay attention to adjusting the user name and password in the code.

(5 messages) python operation neo4j (import CSV operation) of knowledge map_ Chenxi 0502 blog - CSDN blog

However, for students without database foundation, it is necessary to know the difference between Node name and Node label; And what the parameters of Node and Relationship functions mean. Refer to the link below

(2 private letters / 42 messages) what is the relationship between node name and node label in Neo4j- Zhihu (zhihu.com)

2.2 importing CSV into neo4j

The pandas library and py2neo implementation are combined here. In addition, for the vacancy values in the table, py2neo will directly report an error instead of automatically skipping, so we need to add judgment in the code. The details are as follows:

def CSV4neo4j(filepath, graph):
    # Read CSV file
    data = pd.read_csv(filepath, error_bad_lines=False, encoding='GBK')

    for i in range(len(data)):
        """Skip first row header"""
        if str(data.iloc[i, 0]) == 'original text':
            continue
        """Establish nodes and relationships line by line( if (used to judge whether the event attribute value exists)"""
        text = Node('text', name=data.iloc[i, 0])
        graph.create(text)

        """Attribute 1"""
        if str(data.iloc[i, 1]) != 'nan':
            prim = Node('prim', name=data.iloc[i, 1])
            graph.create(prim)
            # Build relationships
            text2prim = Relationship(text, 'Attribute 1', prim)
            graph.create(text2prim)

        """Attribute 2"""
        if str(data.iloc[i, 2]) != 'nan':
            size = Node('size', name=data.iloc[i, 2])
            graph.create(size)
            # Build relationships
            prim2size = Relationship(text, 'Attribute 2', size)
            graph.create(prim2size)

        """Attribute 3"""
        if str(data.iloc[i, 3]) != 'nan':
            trans = Node('trans', name=data.iloc[i, 3])
            graph.create(trans)
            # Build relationships
            text2trans = Relationship(text, 'Attribute 3', trans)
            graph.create(text2trans)

2.3 merging the same nodes (de duplication)

If the same nodes are not merged, the relationship between nodes will not be clear, and the significance of knowledge map will be small, so this part is also a very important step.

In the process of implementation, I found that many other blogs are implemented in Cypher language of Neo4j, but it is too difficult for me, who has a poor database foundation and has not operated the database with mySQL instructions... So I found another way, that is to search whether the same node already exists in the current figure. I mainly refer to the following two blogs. The first introduces the overall idea and the second focuses on the code implementation. I won't introduce it in detail here. Please move to the great God blog hee hee!

(5 messages) neo4j, python. When using csv to create nodes in batches, nodes with the same name will be created repeatedly. Weight removal_ qq_51388397 blog - CSDN blog

(5 messages) python operation Neo4j merges entities with the same name_ When does the west wind stop blog - CSDN blog

Therefore, the above CSV4neo4j function is improved as follows:

def CSV4neo4j(filepath, graph):
    # Import CSV file
    data = pd.read_csv(filepath, error_bad_lines=False, encoding='GBK')

    for i in range(len(data)):
        """Skip first row header"""
        if str(data.iloc[i, 0]) == 'original text':
            continue

        """Establish nodes and relationships line by line( if (used to judge whether the event attribute value exists)"""
        text = Node('text', name=data.iloc[i, 0])
        graph.create(text)

        """Attribute 1"""
        if str(data.iloc[i, 1]) != 'nan':
            prim = Node('prim', name=data.iloc[i, 1])
            matcher = NodeMatcher(graph)
            matchlist = list(matcher.match('prim', name=data.iloc[i, 1]))
            if len(matchlist) > 0: # Node already exists, no need to create
                prim = matchlist[0]
                # Direct relationship building
                text2prim = Relationship(text, 'Attribute 1', prim)
                graph.create(text2prim)
            else: # Node to be created
                graph.create(prim)
                # Build relationships
                text2prim = Relationship(text, 'Attribute 1', prim)
                graph.create(text2prim)

        """Attribute 2"""
        if str(data.iloc[i, 2]) != 'nan':
            size = Node('size', name=data.iloc[i, 2])
            matcher = NodeMatcher(graph)
            matchlist_size = list(matcher.match('size', name=data.iloc[i, 2]))
            if len(matchlist_size) > 0:
                size = matchlist_size[0]
                # Build relationships
                prim2size = Relationship(text, 'Attribute 2', size)
                graph.create(prim2size)
            else:
                graph.create(size)
                # Build relationships
                prim2size = Relationship(text, 'Attribute 2', size)
                graph.create(prim2size)

        """Attribute 3"""
        if str(data.iloc[i, 3]) != 'nan':
            trans = Node('trans', name=data.iloc[i, 3])
            matcher = NodeMatcher(graph)
            matchlist_trans = list(matcher.match('trans', name=data.iloc[i, 3]))
            if len(matchlist_trans) > 0:
                trans = matchlist_trans[0]
                # Build relationships
                text2trans = Relationship(text, 'Attribute 3', trans)
                graph.create(text2trans)
            else:
                graph.create(trans)
                # Build relationships
                text2trans = Relationship(text, 'Attribute 3', trans)
                graph.create(text2trans)

Finally, call as needed

from py2neo import Graph, Node, Relationship, NodeMatcher
import pandas as pd

# Connect neo4j database and enter address, user name and password
graph = Graph('http://localhost:7474', username='neo4j', password='')
graph.delete_all()

myfilepath = 'D:/Bi Sheh/code/data/test.CSV'
CSV4neo4j(myfilepath, graph)

give the result as follows

Welcome to communicate!

Topics: Python neo4j