Elasticsearch numeric type can also store String type, which is interesting~

Posted by flhtc on Mon, 03 Jan 2022 11:53:31 +0100

1, Foreword|

Recently, a customer often asked that a numeric field can also store strings, or that the field type has been set to float, but the actual stored string is still string. How to solve it? Today, take some time to sort out the whole process.

2, Practical drill

1. Define an index mapping and specify the type as float Single precision floating point

PUT nginxindex
{
  "mappings": {
    "properties": {
      "price":{
        "type": "float"
      }
    }
  }
}

2. Write several documents to see the effect

POST nginxindex/_doc
{
  "price":4.68        //Number type
}

POST nginxindex/_doc
{
  "price": "4.69"      //String type
}

POST nginxindex/_doc
{
  "price": "free for charge"    //String type
}

3. Comparison results

First, write normally and return,

Second, write normally and return,

Third, unable to write, error message. The prompt is shown in the figure below

This error is probably that the string content cannot be parsed to the type of floa type This is easy to understand, but the string type number in the second document can be written. Why? This is also the focus of this section and the problem that customers need to solve:

3, Cause of problem

When users store numbers in string form, no matter which number type, they can recognize and store them by default, so there will be a lot of inconsistency in the number of returned results when the end customer searches on the terminal. What is the reason for this? Let's take a look at the official website, which translates as follows:

Data is not always clean. Depending on how it is generated, a number may be rendered as a real JSON number in the JSON body, for example. 5, but it may also be rendered as a string, for example. “5”. Alternatively, a number that should be an integer can be rendered as a floating point, such as 5.0 or even "5.0".

You need to configure a force program to clean up dirty values to suit the data type of the field. Refer to the following links for details:

https://www.elastic.co/guide/en/elasticsearch/reference/current/coerce.html#coerce

Solution: add a coerce parameter to the mapping field attribute definition of the index and set its value to false The default is true

4, Practice again

PUT nginxnewindex
{
  "mappings": {
    "properties": {
      "price":{
        "type": "float",
        "coerce": false        //Defining mapping
      }
    }
  }
}

POST nginxnewindex/_doc        //Write several documents
{
  "price":4.68
}

POST nginxnewindex/_doc      //Write error,
{
  "price": "4.69"
}

The result shows that the third document is written with an error, and the screenshot is as follows:

The error prompt also means that you define a field of float type, but write a string.

In this way, the user can find the write error message at the first time and reverse the front-end write format in time to prevent subsequent impact on the business. Then this is the function of the strict matching parameter.

5, How to smoothly solve user field type errors in the actual production environment?

Once the mapping field type is defined, it cannot be modified. In the actual user production environment, the new data can be solved by modifying the new index mapping parameter. So how to achieve smooth change for the index data of stock. Yes, what people think of is reindex Re index. Or take the error index reported earlier in this article as the column, how to realize the conversion of string type float to pure float digital type.

First look at the document type of the source index. price is a string type, and then change it to float

1. First create a target index and specify the mapping parameter

PUT nginxnewindex2
{
  "mappings": {
    "properties": {
      "price":{
        "type": "float",
        "coerce": false
      }
    }
  }
}

2. Execute reindex copy. There is a problem here. Please pay attention

POST _reindex
{
  "source": {
    "index": "nginxindex"
  },
  "dest": {
    "index": "nginxnewindex2"
  }
}

The returned error is as follows:

We all understand this error, that is, the written field is a string, but the actual target stored field type is a numeric type, and the parsing failed. reindex is also interrupted.

Therefore, reindex is limited in some scenarios. Inconsistent mapping between the two indexes will cause index synchronization failure.

Is there a way to change the field type of the stock index and then copy it to the target index? The answer is yes. pipeline pretreatment is used here. Before reindex copying, change the field type of the source index first, and then write it to the target index.

6, The Reindex+pipeline preprocessing method can change and copy the stock index field type

Requirement: when implementing reindex, convert the source index format to float

The first step is to create a processor

PUT _ingest/pipeline/my-pipeline-id
{
  "description": "converts the content of the price field to an float",
  "processors" : [
    {
      "convert" : {
        "field" : "price",
        "type": "float"
      }
    }
  ]
}

Step 2: create a target index

PUT nginxnewindex2
{
  "mappings": {
    "properties": {
      "price":{
        "type": "float",
        "coerce": false
      }
    }
  }
}

Step 3: reindex+pipeline

POST _reindex
{
  "source": {
    "index": "nginxindex"    //Source index, index to transform
  },
  "dest": {
    "index": "nginxnewindex2",
    "pipeline": "my-pipeline-id"
  }
}

The return is as follows: no error is reported, and the execution is successful.

Then let's look at the mapping of the index of the target copy to see if the new field type has become a strong float. As follows: the change was successful.

So far, the problem of customers is completely solved. The problem is sorted out. In the future, the two field types are different and need to be converted to format. You can do the same.

7, Summary

Starting from an actual production case, this paper analyzes and solves the problem of accurate value of a field and smoothly solves the problem of type conversion of stock index field. Because this method is classic, it is specially sorted out, hoping to help people in need.

End.

Topics: ElasticsearchService