Elasticsearch series - Custom mapping

Posted by phpgeek17 on Fri, 27 Dec 2019 01:17:26 +0100

outline

This article continues with the previous one, focusing on customizing mapping, custom objects, and the underlying structure of array collection classes.

Custom mapping

The previous article described the automatic mapping of Elasticsearch. When creating an index, you can specify the mapping information first, or take the music index as an example:

PUT /music
{
  "mappings": {
    "children": {
      "properties": {
        "content": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "language": {
          "type": "keyword"
        },
        "length": {
          "type": "long"
        },
        "likes": {
          "type": "long"
        },
        "name": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }
    }
  }
}

There are five fields: name, content, language, length, likes. Here we specify the type of language field as keyword. This field does not need to be participled, it can be matched exactly.

Modify mapping

We can specify mapping information when we create an index or when we add a new field to the index, but if a field already exists, we cannot modify its mapping or an error will occur, for example: We add an author field to the music index and specify that its type is text, analyzer, search_analyzer are english

PUT /music/_mapping/children
{
  "properties" : {
    "author" : {
      "type" : "text",
      "index": true, 
      "analyzer": "english",
      "search_analyzer": "english"
    }
  }
}

If you try to modify an existing field, such as the name field, you will be prompted for an error: Request:

PUT /music/_mapping/children
{
  "properties" : {
    "name" : {
      "type" : "text",
      "index": true, 
      "analyzer": "english",
      "search_analyzer": "english"
    }
  }
}

Error message:

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "Mapper for [name] conflicts with existing mapping in other types:\n[mapper [name] has different [analyzer]]"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "Mapper for [name] conflicts with existing mapping in other types:\n[mapper [name] has different [analyzer]]"
  },
  "status": 400
}

Complex data type underlying structure

The mapping information mentioned above refers to the basic data types. We know that Elasticsearch is an object-oriented distributed storage document system. We will certainly encounter a variety of data structures such as custom objects, collection arrays, nested objects, etc. How does ES support and handle these complex object types?

Array, Collection Class

Collections are identical to arrays in JSON strings and can be considered as array types. There is no special mapping requirement for arrays in Elasticsearch. They are indexed the same as text, and they are indexed by multiple entries resulting from word breaking.

One thing to note is that elements in the array should be of the same data type, such as characterists, dates, or values. It cannot be mashed up. Automatic mapping determines the type by the first bit element. If mashed, elements in the array that are not of the same type as the first bit element will get an error when indexing.

Null type The underlying Lucene cannot store null values, and null types are not indexed:

"null_value":               null,
"empty_array":              [],
"array_with_null_value":    [ null ]

Complex Objects Complex objects typically have one or more layers of nesting, that is, attributes within the object, types that are also objects, and sometimes arrays, for example:

{
  "address": {
    "country": "CN",
    "province": "GD",
    "city": "SZ"
  },
  "name": "Herry",
  "age": 28,
  "birth_date": "1992-04-29"
}

Personnel information has address information in addition to name and age, and address attribute itself is an object. It contains country, province and city attributes. Viewing its mapping information is also a nested structure (content is deleted, only hierarchical attributes are retained):

{
  "person": {
    "mappings": {
      "info": {
        "properties": {
          "address": {
            "properties": {
              "city": {
                "type": "text"
              },
              "country": {
                "type": "text"
              },
              "province": {
                "type": "text"
              }
            }
          }
        }
      }
    }
  }
}

Elasticsearch does some flattening when storing hierarchical data objects, and the Luence document is a list of KV structures that store the above data as follows:

{
  "address.country": "CN",
  "address.province": "GD",
  "address.city": "SZ",
  "name": "Herry",
  "age": 28,
  "birth_date": "1992-04-29"
}

So eventually you want to query based on the country under address as a condition, you should write this:

GET /person/info/_search
{
  "query": {
    "match": {
      "address.country": "CN"
    }
  }
}

Complex objects in arrays If an element in an array is of a basic data type, it is fine to say that all elements in the array are of the same type. If an element in the array is an object, how can it be indexed? If an array structure is like this:

{
    "likes": [
        { "name": "Three Zhang", "datetime": "2019-12-01 08:58:12"},
        { "name": "Four Lee", "datetime": "2019-12-01 09:12:23"},
        { "name": "Five Wang", "datetime": "2019-12-01 09:15:58"}
    ]
}

After flattening (from columns to rows), the data will look like this:

{
	"likes.name": ["Three Zhang","Four Lee","Five Wang"],
	"likes.datetime": ["2019-12-01 08:58:12","2019-12-01 09:12:23","2019-12-01 09:15:58"],
}

Did you find any problems?After this process, the association between the original objects will be lost, and the array is just a stack of disordered elements. If you want to query the combination of conditions such as "Three Zhang has given me some approval since 2009-12-01 09:00:00", a query record will appear, which is different from the expected results, obviously wrong.

Note that when an element in an array is an object type, it needs to be declared as nested in order to get the desired query effect, which will be described later.

Summary

This article mainly supplements the content of mapping, and briefly describes the underlying storage structure of various complex objects. It is important to note that object processing within an array, which needs to be declared as nested to get the correct results.

Focus on Java high-concurrency, distributed architecture, more technology dry goods to share and learn from, please follow Public Number: Java Architecture Community

Topics: Programming ElasticSearch Java JSON Attribute

Programmer Think