Distributed Search Engine-Elastic Search Quick Start

Posted by gorskyLTD on Sun, 21 Jul 2019 16:36:08 +0200

I. Environmental Construction

1. Install Elastic Search

Installing Elastic Search-7.2.0 in MacOS is a very simple procedure, which can be accomplished with a few lines of commands. Here are the installation steps

# Download the Elastic Search program
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.2.0-darwin-x86_64.tar.gz

# sha512 secret key file
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.2.0-darwin-x86_64.tar.gz.sha512
# Document comparison
shasum -a 512 -c elasticsearch-7.2.0-darwin-x86_64.tar.gz.sha512 
# decompression
tar -xzf elasticsearch-7.2.0-darwin-x86_64.tar.gz

2. Start the Elastic Search service

After completing step 1, we can start the ES service by running the following commands directly at the terminal.

cd elasticsearch-7.2.0/
./bin/elasticsearch

After executing the startup command, if the following information is printed on the terminal, the ES service is successfully started.

[2019-07-21T14:29:48,576][INFO ][o.e.p.PluginsService     ] [maxiangchengdeMacBook-Pro.local] loaded module [aggs-matrix-stats]
[2019-07-21T14:29:48,577][INFO ][o.e.p.PluginsService     ] [maxiangchengdeMacBook-Pro.local] loaded module [analysis-common]
[2019-07-21T14:29:48,577][INFO ][o.e.p.PluginsService     ] [maxiangchengdeMacBook-Pro.local] loaded module [ingest-common]
[2019-07-21T14:29:48,577][INFO ][o.e.p.PluginsService     ] [maxiangchengdeMacBook-Pro.local] loaded module [ingest-geoip]
[2019-07-21T14:29:48,577][INFO ][o.e.p.PluginsService     ] [maxiangchengdeMacBook-Pro.local] loaded module [ingest-user-agent]

2. Quick Start

After the Elastic Search service is started locally, ES defaults to listen on port 9200, where our queries, modifications and additions will be completed. The interaction between client and ES uses http protocol, and the interface design uses restful style. It is very simple and friendly to use. In other words, we can directly use POSTMAN and ES to interact, by sending PUT,GET,DELETE and other requests to complete the CRUD operation of data. HTTP protocol is also very friendly to programming languages, because almost all programming languages support HTTP protocol, which greatly reduces the use threshold, which is also the reason why ES is very popular.

There are several concepts in ES that need to be understood in advance. ES is different from traditional databases. Its data storage units and relational databases will be slightly different. Some may just be called differently. I think the essence of ES is the same. Take MySQL as an example, we will divide the data storage unit into the following levels: database, table, row and column. Represents databases, tables, rows and columns, respectively. Similarly, there is also a division of data units in ES. The following chart shows the corresponding relationship between data storage unit and MySQL data storage unit in ES.

In the latest version of ES, there are only the concepts of index, document and field. Index is equivalent to the database in Mysql. Document represents data row, field represents column, without the concept of table, the other are the same.

1. New Index

Understanding the above concepts, now let's create a new index. Because the interaction of ES is based on http protocol, the index can be created only by using POSTMAN to initiate http requests to local ES services. The request type is PUT requests, with the name of the index that needs to be created in the request address. For example, the following request creates a new customer index.
PUT http://localhost:9200/customer
If the following response is received after the request is completed, the index is successfully established.

{
    "acknowledged": true,
    "shards_acknowledged": true,
    "index": "customer"
}

The index name in ES can not be repeated. If the index is established repeatedly, the following response will be obtained, indicating that the index already exists.

{
    "error": {
        "root_cause": [
            {
                "type": "resource_already_exists_exception",
                "reason": "index [customer/lpNY_ivHTBedu8Nj14jdBQ] already exists",
                "index_uuid": "lpNY_ivHTBedu8Nj14jdBQ",
                "index": "customer"
            }
        ],
        "type": "resource_already_exists_exception",
        "reason": "index [customer/lpNY_ivHTBedu8Nj14jdBQ] already exists",
        "index_uuid": "lpNY_ivHTBedu8Nj14jdBQ",
        "index": "customer"
    },
    "status": 400
}

2. Add Document

The next step is to add data to the index, which is called Document. The interaction between client and ES adopts JSON format and HTTP protocol. For example, the following request will create a document with id 1 in the customer's index, which contains fields such as username,age,phone, etc. In addition, the document id is specified after _doc, which is unique in each index, similar to the primary key id in the database.

PUT /customer/_doc/1
{
    "username" : "jack",
    "age" : 18,
    "phone" : "18880000000"
}

After successful addition, you get the following response

{
    "_index": "customer",
    "_type": "_doc",
    "_id": "1",
    "_version": 1,
    "result": "created",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 0,
    "_primary_term": 1
}

There are many fields in the response. Now we just need to focus on the success field. 1 means success.

3. Modify Document

We can also modify the document, specify the document id, submit JSON data that needs to be modified, and modify the document with the specified ID. For example, the following request will modify the document with the document ID of 1.

PUT /customer/_doc/1
{
    "username" : "meetmax",
    "age" : 15,
    "phone" : "18880000000"
}

When the modification is successful, the following response is obtained

{
    "_index": "customer",
    "_type": "_doc",
    "_id": "1",
    "_version": 3,
    "result": "updated",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 2,
    "_primary_term": 1
}

4. Delete Document

For some unnecessary documents, you can also perform deletion operations, just specify the index name and document id, and then call the DELETE method.
DELETE /customer/_doc/1
After successful deletion, the following response is obtained

{
    "_index": "customer",
    "_type": "_doc",
    "_id": "1",
    "_version": 4,
    "result": "deleted",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 3,
    "_primary_term": 1
}

5. Match query

With index and document data, we need to query existing data. Unlike SQL statements, ES also has its own query language, called Query DSL. Simply speaking, it uses JSON as the carrier to construct query conditions, but changes the way of expression. The essence of SQL statement is the same, and the query function can be realized. ES even supports the SQL grammar, which can be realized only by installing and extending. For example, the following request will query a document in the customer index whose username field value is jack.

GET /customer/_search
{
    "query" : {
        "match" : { "username" : "jack" }
    }
}

When the query succeeds, you get the following response, where the hit field contains the matched data

{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 0.9808292,
        "hits": [
            {
                "_index": "customer",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.9808292,
                "_source": {
                    "username": "jack",
                    "age": 18,
                    "phone": "18880000000"
                }
            }
        ]
    }
}

Summary

Generally speaking, ES is still very easy to use, of course, it is only limited to the beginning. What is used in this paper is only the tip of the iceberg in the ES function. In real practice, there are more complex uses, which can achieve very powerful functions. For example, aggregate queries and pagination of data, multi-conditional filtering queries, and even geographic location queries.
In addition, we may have some ambiguity about the core concepts of ES and the underlying implementation mechanism of ES, but it doesn't matter if we don't understand it now. With practical experience, we will understand it quickly and master it more profoundly in the future. Later, I will also summarize and analyze the core concepts and implementation principles of ES, hoping to be useful to you!

Topics: Database ElasticSearch MySQL JSON