Elasticsearch foundation and python operation

Posted by adrianpeyton on Wed, 26 Jan 2022 00:33:39 +0100

1, ES Foundation

Official documents: Elasticsearch: authoritative guide | Elastic

1. Noun comparison mysql

ElasticSearchMySQL
IndexIndexesDatabasedatabase
TypetypeTablesurface
DocumentfileRowthat 's ok
FieldfieldColumncolumn
MappingProcessing rulesSchemarelationship

Indexes: indexes are the plural of indexes, which represent many indexes and are equivalent to databases in relational databases

Type: type is to simulate the concept of N tables under the database in mysql. There can be different types of indexes under an index library, such as commodity index and order index, with different data formats. However, this will cause confusion in the index library, so this concept will be removed in future versions

Document: the original data stored in the index library. For example, each piece of commodity information is a document, which is equivalent to that the data under a database Table consists of multiple rows

field: an attribute in a document, which is equivalent to multiple column s (attributes) under a database Table

Mapping configuration: the data type, attribute, index, storage and other characteristics of a field are equivalent to those in a relational database, schema Defines the table, the fields of each table, and the relationship between tables and fields

To be improved

  • shard: each part after data splitting
     replica: replication of each shard

Detailed reference: Elastic search architecture and term interpretation_ Time is always laughing at i's blog - CSDN blog

2. Operation comparison mysql

  • The operations of adding insert, deleting delete, modifying update and querying select in the database are equivalent to the operations of adding PUT, deleting delete, modifying POST and querying GET in ES
  • The group by, avg, sum and other functions in MySQL are similar to some features of Aggregations in ES.

  • The de duplication distinct in MySQL is similar to the cardinality operation in ES.

3. Common statements

New index

PUT /my_index
{
    "settings": { ... any settings ... },
    "mappings": {
        "type_one": { ... any mappings ... },
        "type_two": { ... any mappings ... },
        ...
    }
}

Example:
PUT es_index
{
  "settings": {
    "index": {
      "number_of_shards": "10",  //Divided into 10 pieces
      "number_of_replicas": "1"  //1 backup
    }
  },
  "mappings": {    
    "properties":{
        "message":{
            "type":"string"
        },
        "price":{
            "type":"string"
        },
        "tid":{
            "type":"string"
        },
        "user":{
            "type":"string"
        }
    }
}

For details, please refer to elasticsearch mapping settings_ Time is always laughing at i's blog - CSDN blog_ Elasticsearch mapping time zone

View mapping relationships

GET /Index name/_mapping

Query index settings

GET /Index name/_settings

New / update data

PUT /{index}/{type}/{id}
{
  "field": "value",
  ...
}

Example:
PUT /weather/d1/1
{
"age":1,
"name":"zs",
"bri":"2018-08-08"
}

Delete a specified piece of data

DELETE  /{index}/{type}/{id}
{
  "field": "value",
  ...
}

Example:
DELETE  weather/d1/2

Delete index

#Delete an index
DELETE /my_index
#Delete multiple indexes
DELETE /index_one,index_two
DELETE /index_*
#Delete all indexes
DELETE /_all
DELETE /*

Query data

#Query a piece of data
GET {index}/{type}/{id}
#Query all numbers
GET {index}/_search
#Condition query
GET {index}/_search?q=Field:value

Explanation of the returned results of the query: Empty search | Elastic search: authoritative guide | Elastic

json query

GET {index}/_search
{
    "query":{
      "match_all": {}
    }
}

More references: In depth search | Elastic search: authoritative guide | Elastic

 

2, python operations

Official documents: Python Elasticsearch Client — Elasticsearch 7.16.3 documentation

1. Install elasticsearch

pip install elasticsearch

2. Connect

from elasticsearch import Elasticsearch
 
 
class ElasticSearchClass(object):
 
    def __init__(self, host, port, user, passwrod):
      
        http_auth = user + ":" + passwrod 
        self.es = Elasticsearch(
            [host],
            port=port,
            http_auth=http_auth,
            # sniff_on_start=True,  # Sniff es cluster server before startup
            sniff_on_connection_fail=True,  # Whether to refresh the es node information when the es cluster server node connection is abnormal
            sniff_timeout=60  # Refresh node information every 60 seconds
        )

3. Common operation

Create index

    @retry(tries=3)
    def create_index(self, index_name, mapping):

        self.es.indices.create(index=index_name, ignore=400)
        self.es.indices.put_mapping(index=index_name, body=mapping)

Insert a piece of data

def insert(self, index, type, body, id=None):
        '''
        Insert a body Assigned to index,designated type lower;
        Can be specified Id,If not specified,ES Automatically generated
        :param index: To be inserted index value      
        :param body: Data to be inserted  # dict type
        :param id: custom Id value
        :return:
        '''
        return self.es.index(index=index,doc_type=type,body=body,id=id,request_timeout=30)

Query a piece of data

def get(self, doc_type, indexname, id):
        # A specific item in the index
        return self.es.get(index=indexname,doc_type=doc_type, id=id)

Search by index

#Search the data through the index. After searching, the data display defaults to ten pieces of data
def searchindex(self, index):
        """
        Find all index data
        """
        try:
            return self.es.search(index=index)
        except Exception as err:
            print(err)

Find all qualified data under index

 def searchDoc(self, index=None, type=None, body=None):
        '''
        lookup index All eligible data under
        :param index:
        :param type:
        :param body: Filter statement,accord with DSL Syntax format
        :return:
        '''
        return self.es.search(index=index, doc_type=type, body=body)

Update a specific one

def update(self, doc_type, indexname,  body=data, id):
        # Update a specific one
        return self.es.update(index=indexname,doc_type=doc_type,  body=data, id=id)

Statistical operation

def count(self, indexname):
        """
        :return: Statistics index total
        """
        return self.conn.count(index=indexname)

def proportion_not_null(self, index, field=None):
        """Non empty statistics"""
        a = self.count(index)['count']
        b = self.count(index, {'query': {'bool': {'must': {'exists': {'field': 
                                                                     field}}}}})['count']
        print(field, a, b, b / a)
 
    def aggs_terms(self, index, field, size=15):
        """Single field statistics"""
        return self.search({
            'aggs': {
                'CUSTOM NAME': {
                    'terms': {
                        'field': field,
                        'size': size,  # Solve the problem of incomplete display of aggs
                    }
                }
            }
        }, index)['aggregations']['CUSTOM NAME']['buckets']

Delete all data under the index

def clear_index(index):
    body ={
        "query": {
            "match_all": {
            }
        }
    }

    self.es.delete_by_query(index,body)

Delete a specific entry in the index

#Data is deleted based on id 
def delete(self, indexname, doc_type, id):
        """
        :param indexname:
        :param doc_type:
        :param id:
        :return: delete index Specific one in
        """
        self.es.delete(index=indexname, doc_type=doc_type, id=id)

Delete all indexes

def delete_index(self, index):
        # Delete all indexes
        if not index: 
            for index in es.indices.get("*"):
                es.indices.delete(index)
        else: 
            es.indices.delete(index)

3, Install the requests module and operate Elasticsearch through GET and POST

reference resources: python Elasticsearch5.x use - shhnwangjian - blog Park

 

Topics: Database MySQL Big Data ElasticSearch