Elasticsearch search search engine storage (basic use)

Posted by keefy on Wed, 02 Feb 2022 09:16:32 +0100

1.Elasticsearch related concepts

There are several basic concepts in elastic search, such as node, index, document, etc., which are described below. Understanding these concepts is helpful for getting familiar with Elasticsearch

  • Nodes and clusters
    Elasticsearch is essentially a distributed database that allows multiple servers to work together, and each server can run multiple elasticsearch instances.
    A single Elasticsearch instance is called a node, and a group of nodes form a cluster.

  • Indexes
    The index is the index. Elasticsearch will index all fields and write a reverse index after processing. When searching for data, directly search the index. Therefore, the top-level unit of elasticsearch data management is called index, which is actually equivalent to the concept of database in MySql and MongDB. It is worth noting that the name of each index (and database) must be lowercase.

  • file
    A single record in the index is called a document, and many documents form an index.
    For documents in the same index, the same scheme is not required, but the structure should be consistent, because it is conducive to efficiency.

  • type
    Documents can be grouped. For example, the document weather can be grouped by city (Beijing and Shanghai) or by climate (sunny and rainy days). This grouping is called type. It is a virtual logical grouping used to filter documents, similar to data tables in mysql and collections in mongodb.
    Different types of documents should have similar structures. For example, the id field can no longer be a string in this group, but it becomes a value in another group. This is different from the table of relational data. Because we should save the data with exactly the same nature into two indexes instead of having one index for the data with the same type

  • field
    Each document is similar to a json structure, which allows multiple fields. Each field has its corresponding value. Multiple fields form a document, which can be compared to the fields in the mysql data table,
    In elastic search, documents belong to one type, and these types exist in the index. We can draw a simple comparison diagram to the relationship with traditional databases.
    Relational DB --> Databases --> Tables --> Rows – >Columns
    ElasticSearch --> indices -->types --> documents – > fields

2. Preparation

First of all, ensure that Es has been installed. For the installation method, please refer to:

https://cuiqingcai.com/31085.html
From Cui Dashen's blog

After installation, confirm that it can operate normally on port 9200.
On the local 9200 port:

{
  "name" : "DESKTOP-3BGU5UH",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "rh-hJvleSkysw4eED99kUw",
  "version" : {
    "number" : "7.17.0",
    "build_flavor" : "default",
    "build_type" : "zip",
    "build_hash" : "bee86328705acaa9a6daede7140defd4d9ec56bd",
    "build_date" : "2022-01-28T08:36:04.875279988Z",
    "build_snapshot" : false,
    "lucene_version" : "8.11.1",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

The library of python operation es is

pip3 install elasticsearch

3. Create index

from elasticsearch import Elasticsearch

es = Elasticsearch()
result = es.indices.create(index="news", ignore=400)
print(result)

Here, we first create an Elasticsearch object without setting any parameters. By default, it will link to the es service of the local 9200 port. We can also set specific connection information, such as:

es = Elasticsearch(["https://[Username: password @] hostname: Post "] (verify_certs = true) # verify SSL certificate

The operation results are as follows:

{'acknowledged': True, 'shards_acknowledged': True, 'index': 'news'}

You can see that the returned result is JSON type, and the acknowledge field indicates that the creation operation is successful.
If it appears on the execution side:

{'error': {'root_cause': [{'type': 'resource_already_exists_exception', 'reason': 'index [news/TNEjlIt8SmyuI0veOvQCXg] already exists', 'index_uuid': 'TNEjlIt8SmyuI0veOvQCXg', 'index': 'news'}], 'type': 'resource_already_exists_exception', 'reason': 'index [news/TNEjlIt8SmyuI0veOvQCXg] already exists', 'index_uuid': 'TNEjlIt8SmyuI0veOvQCXg', 'index': 'news'}, 'status': 400}

Creation failed, where the status code is 400, indicating that the error reason is that the index already exists.
Note that in the code here, we use the ignore parameter of 400, indicating that if the return result is 400, we will ignore this error and report no error
If it is not added, it will appear:

  File "D:\python3 Internet worm\venv\lib\site-packages\elasticsearch\transport.py", line 466, in perform_request
    raise e
  File "D:\python3 Internet worm\venv\lib\site-packages\elasticsearch\transport.py", line 434, in perform_request
    timeout=timeout,
  File "D:\python3 Internet worm\venv\lib\site-packages\elasticsearch\connection\http_urllib3.py", line 291, in perform_request
    self._raise_error(response.status, raw_data)
  File "D:\python3 Internet worm\venv\lib\site-packages\elasticsearch\connection\base.py", line 329, in _raise_error
    status_code, error_message, additional_info
elasticsearch.exceptions.RequestError: RequestError(400, 'resource_already_exists_exception', 'index [news/TNEjlIt8SmyuI0veOvQCXg] already exists')

Therefore, we are good at using the ignore parameter to eliminate some unexpected situations, and the program will not be interrupted.

4. Delete index

from elasticsearch import Elasticsearch

es = Elasticsearch(verify_certs=True)  # Verify SSL certificate
result = es.indices.delete(index="news", ignore=[400, 404])
print(result)

The role of ignore is consistent with the above
The returned result is:

{'acknowledged': True}

If deleted again:

{'error': {'root_cause': [{'type': 'index_not_found_exception', 'reason': 'no such index [news]', 'resource.type': 'index_or_alias', 'resource.id': 'news', 'index_uuid': '_na_', 'index': 'news'}], 'type': 'index_not_found_exception', 'reason': 'no such index [news]', 'resource.type': 'index_or_alias', 'resource.id': 'news', 'index_uuid': '_na_', 'index': 'news'}, 'status': 404}

5. Insert data

Elasticsearch is like MongoDB. When inserting data, you can directly insert structural Chinese dictionary data. When inserting data, you can call the create method. For example, let's insert a piece of news here.

from elasticsearch import Elasticsearch

es = Elasticsearch(verify_certs=True)  # Verify SSL certificate
es.indices.create(index="news", ignore=[400, 404])
data = {
    "title": "Brave the wind and waves, live up to your youth, and strive to realize your youth dream",
    "url": "https://view.inews.qq.com/a/EDU2021041600732200"
}
result = es.create(index="news", id=1, body=data)
print(result)

Pass in three parameters. Index represents the index name, id is the unique identification of the data, and body document content. The results are as follows:

{'_index': 'news', '_type': '_doc', '_id': '1', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 0, '_primary_term': 1}

You can also use the index method to insert data. create needs to specify the id field to uniquely identify the data, while index can not specify the id, then it will automatically generate an id

result = es.index(index="news", body=data)

The index method is actually called inside the create method, which encapsulates the index method

6. Update data

Updating data is also very simple. We also need to specify the id and content of the data and call update. The code is as follows:

from elasticsearch import Elasticsearch

es = Elasticsearch()

data = {
    'title': 'Brave the wind and waves, live up to your youth, and strive to realize your youth dream',
    'url': 'http://view.inews.qq.com/a/EDU2021041600732200',
    'date': '2021-07-05'
}
result = es.update(index='news', body=data, id=1)
print(result)

The problems encountered in this modification may be related to the book version published by Cui Dashen:

elasticsearch.exceptions.RequestError: RequestError(400, 'x_content_parse_exception', '[1:2] [UpdateRequest] unknown field [title]')

For example:

from elasticsearch import Elasticsearch

es = Elasticsearch()

data = {
    "doc":{
        'title': 'Brave the wind and waves, live up to your youth, and strive to realize your youth dream',
        'url': 'http://view.inews.qq.com/a/EDU2021041600732200',
        'date': '2021-07-09'
    }

}
result = es.update(index='news', body=data, id=1)
print(result)

result:

{'_index': 'news', '_type': '_doc', '_id': '1', '_version': 2, 'result': 'updated', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 8, '_primary_term': 1}

7. Delete data

from elasticsearch import Elasticsearch

es = Elasticsearch()

result = es.delete(index='news', id=1)
print(result)

The operation results are as follows:

{'_index': 'news', '_type': '_doc', '_id': '1', '_version': 3, 'result': 'deleted', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 11, '_primary_term': 1}

Deleted successfully_ Version: 3 version changes. The first time is to create, the second time is to update, and the third time is to delete

8. Query data

The above operations are very basic and simple. The real strength of es lies in its retrieval function
For Chinese, we need to install a word segmentation plug-in, using elastic search analysis IK. We use the elasticsearch plugin, another command line tool of es, to install this plug-in. The version installed here is 7.13.2. To correspond to the es version, the command is:

elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.13.2/elasticsearch-analysis-ik-7.13.2.zip

After the installation, r es tart.

First, we rebuild an index and specify the field of word segmentation. The corresponding code is as follows:

from elasticsearch import Elasticsearch

es = Elasticsearch()
mapping = {
    'properties': {
        'title': {
            'type': 'text',
            'analyzer': 'ik_max_word',
            'search_analyzer': 'ik_max_word'
        }
    }
}
es.indices.delete(index='news', ignore=[400, 404])
es.indices.create(index='news', ignore=400)
result = es.indices.put_mapping(index='news', body=mapping)
print(result)



Here, we first delete the previous index, then create a new index, and then update its mapping information. The field that specifies the word segmentation in the mapping information, including the type of the field, the word splitter analyzer and the searcher search_analyzer. Specify search word breaker search_analyzer is ik_max_word means to use the Chinese word segmentation plug-in we just installed. If it is not specified, the default English word segmentation will be used.
Next, insert a few pieces of data.

from elasticsearch import Elasticsearch

es = Elasticsearch()

datas = [
    {
        'title': 'The outcome of the college entrance examination is very different',
        'url': 'https://k.sina.com.cn/article_7571064628_1c3454734001011lz9.html',
    },
    {
        'title': 'Entering the era of career reshuffle, is the "popular" career still popular?',
        'url': 'https://new.qq.com/omn/20210828/20210828A025LK00.html',
    },
    {
        'title': 'Brave the wind and waves, live up to your youth, and strive to realize your youth dream',
        'url': 'http://view.inews.qq.com/a/EDU2021041600732200',
    },
    {
        'title': 'He has lived our ideal',
        'url': 'https://new.qq.com/omn/20210821/20210821A020ID00.html',
    }
]

for data in datas:
    es.index(index='news', body=data)

Four pieces of data are specified here. They all have title and url fields, and then they are inserted into es through the index method. The index name is news
Next, let's query the relevant contents with keywords:

result = es.search(index="news")
print(result)

The results are as follows:

{'took': 1, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 4, 'relation': 'eq'}, 'max_score': 1.0, 'hits': [{'_index': 'news', '_type': '_doc', '_id': 'PsLjuH4BJ6CKg0kw5yYC', '_score': 1.0, '_source': {'title': 'The outcome of the college entrance examination is very different', 'url': 'https://k.sina. com. cn/article_ 7571064628_ 1c3454734001011lz9. HTML '}}, {' _index ':' news', 'U type': '_doc', '_id': 'p8ljuh4bj6ckg0kw5yz8', '_score': 1.0, '_source': {'title': 'entering the era of career reshuffle, is the "popular" career still popular?', 'URL': ' https://new.qq.com/omn/20210828/20210828A025LK00.html '}}, {' _index ':' news', 'U type': '_doc', 'u id': 'qmljuh4bj6ckg0kw5yae', '_score': 1.0, '_source': {'title': 'ride the wind and waves to live up to your youth and realize your dreams', college entrance examination', 'URL': ' http://view.inews.qq.com/a/EDU2021041600732200 '}}, {' _index ':' news', 'U type': '_doc', 'u id': 'qcljuh4bj6ckg0kw5yal', '_score': 1.0, '_source': {'title': 'he has lived our ideal','url ':' https://new.qq.com/omn/20210821/20210821A020ID00.html '}}]}}

As you can see, four pieces of inserted data are queried here. They appear in the hits field, where the total field indicates the number of query result entries, max_score represents the maximum matching score.
In addition, we can also carry out full-text search, which is the place that reflects the characteristics of es search engine:

from elasticsearch import Elasticsearch
import json

dsl = {
    'query': {
        'match': {
            'title': 'Realize the dream of college entrance examination'
        }
    }
}

es = Elasticsearch()
result = es.search(index='news', body=dsl)
print(result)

Here, we use the dsl statement supported by es to query, and use match to specify the full-text search. The retrieved field is title, and the content is to realize the dream of the college entrance examination. The search contents are as follows:

{'took': 1, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 2, 'relation': 'eq'}, 'max_score': 1.7796917, 'hits': [{'_index': 'news', '_type': '_doc', '_id': 'PMLiuH4BJ6CKg0kwGSbH', '_score': 1.7796917, '_source': {'title': 'Brave the wind and waves, live up to your youth, and strive to realize your youth dream', 'url': 'http://view. inews. qq. COM / A / edu2021041600732200 '}}, {' _index ':'news',' _type ':' _doc ',' _id ':'osliuh4bj6ckg0kwgsyl', '_score': 0.81085134, '_source': {title ':' the outcome of the college entrance examination is very different ','url': 'https://k.sina.com.cn/article_7571064628_1c3454734001011lz9.html '}}]}}

You can also use json form:

from elasticsearch import Elasticsearch
import json
es = Elasticsearch()


dsl = {
    'query': {
        'match': {
            'title': 'Realize the dream of college entrance examination'
        }
    }
}

result = es.search(index='news', body=dsl)
print(json.dumps(result, indent=2, ensure_ascii=False))

The results are clearer as follows:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 1.7796917,
    "hits": [
      {
        "_index": "news",
        "_type": "_doc",
        "_id": "TMLluH4BJ6CKg0kw9SZ3",
        "_score": 1.7796917,
        "_source": {
          "title": "Brave the wind and waves, live up to your youth, and strive to realize your youth dream",
          "url": "http://view.inews.qq.com/a/EDU2021041600732200"
        }
      },
      {
        "_index": "news",
        "_type": "_doc",
        "_id": "SsLluH4BJ6CKg0kw9Cbi",
        "_score": 0.81085134,
        "_source": {
          "title": "The outcome of the college entrance examination is very different",
          "url": "https://k.sina.com.cn/article_7571064628_1c3454734001011lz9.html"
        }
      }
    ]
  }
}

Topics: Database ElasticSearch search engine