outline
This article mainly introduces the data format of Elasticsearch Document, compares the modeling of Java applications and relational databases, describes the basic cluster state query, the basic CRUD operation example of Document and the bulk batch processing example by writing Restful API on Kibana platform.
Document data format
The data model of Java application system is object-oriented, some objects are complex, traditional business system, data needs to fall into relational database. When designing the model in the database field, complex POJO objects will be designed as one-to-one or one-to-many relationships, which will be flattened. When querying, multiple tables are needed to query and restore the format of POJO objects. Elasticsearch is a document database. Document stores data structure that is consistent with POJO and uses JSON format, which makes querying data more convenient.
Document document data example:
{ "fullname" : "Three Zhang", "text" : "hello elasticsearch", "org": { "name": "development", "desc": "all member are lovely" } }
Restful API makes requests easier
The previous article mentioned that Elasticsearch works with Kibana, and the Dev Tools menu on the Kibana interface allows you to send a Reestful request for Elasticsearch.Subsequent Restful API requests, without exception, are executed on the Kibana platform.
Let's try a few requests for cluster information first
- Check cluster health
GET /_cat/health?v
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent 1573626290 14:24:50 hy-application yellow 1 1 1 1 0 0 1 0 - 50.3%
From the above, you can see the number of node s, shard s, etc. There is also a state of cluster, which shows yellow, why is it yellow? Clusters have green, yellow, and red states, which are defined as follows:
- green: primary and replica shares of each index are active
- yellow: the primary share of each index is active, but some replica shares are not active and are not available
- red: not all primary shard s of indexes are active, some indexes have missing data
Our example starts only one elasticsearch instance, only one node, because the index defaults to five primary shards and five replica shards, and primary shards and replica shards under the same node cannot be assigned on one machine (fault tolerant mechanism), all but one primary shard is assigned and started, replica shard does not have a second node to start, becauseIt's the yellow state.If you want to become a green judgment, start another node.
- See what indexes are in the cluster
GET /_cat/indices?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open location 48G_CgE7TiWomlYsyQW1NQ 5 1 3 0 11kb 11kb yellow open company_mem s9DKUeyWTdCj2J8BaFYXRQ 5 1 3 0 15kb 15kb yellow open %{[appname]} ysT9_OibR5eSRu19olrq_w 5 1 32 0 386.5kb 386.5kb yellow open .kibana 4yS67TTOQGOD7l-uMtICOg 1 0 2 0 12kb 12kb yellow open tvs EM-SvQdfSaGAXUADmDFHVg 5 1 8 0 16.3kb 16.3kb yellow open company_org wIOqfx5hScavO13vvyucMg 5 1 3 0 14.6kb 14.6kb yellow open blog n5xmcGSbSamYphzI_LVSYQ 5 1 1 0 4.9kb 4.9kb yellow open website 5zZZB3cbRkywC-iTLCYUNg 5 1 12 0 18.2kb 18.2kb yellow open files _6E1d7BLQmy9-7gJptVp7A 5 1 2 0 7.3kb 7.3kb yellow open files-lock XD7LFToWSKe_6f1EvLNoFw 5 1 1 0 8kb 8kb yellow open music i1RxpIdjRneNA7NfLjB32g 5 1 3 0 15.1kb 15.1kb yellow open book_shop 1CrHN1WmSnuvzkfbVCuOZQ 5 1 4 0 18.1kb 18.1kb yellow open avs BCS2qgfFT_GqO33gajcg_Q 5 1 0 0 1.2kb 1.2kb
- View node information
GET /_cat/nodes?v
Response:
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name 192.168.17.137 38 68 0 0.08 0.03 0.05 mdi * node-1
We can see the name of the node.
- Create Index Command
Create an index named "location"
PUT /location?pretty
Response:
{ "acknowledged": true, "shards_acknowledged": true }
Look at the index and you can see the index location you just created
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open location 48G_CgE7TiWomlYsyQW1NQ 5 1 3 0 11kb 11kb yellow open .kibana 4yS67TTOQGOD7l-uMtICOg 1 0 2 0 12kb 12kb
- Delete Index Command
Delete the index named "location"
DELETE /location?pretty
Looking at the index again, the index location you just created has been deleted
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open .kibana 4yS67TTOQGOD7l-uMtICOg 1 0 2 0 12kb 12kb
Is it easy, is the interface friendly?
Getting Started CRUD Operations
Introduces the basic CRUD operation of document, with English songs for children as background
- New Songs
The structure of the children's song we designed contains four fields: song name, lyrics content, language type, song length in seconds, which is placed in the JSON string. PUT syntax: <REST Verb> /<Index>/<Type>/<ID> REST Verbs can be PUT, POST, DELETE, and the contents behind the backslash are index name, type name, ID, respectively.
The request is as follows:
PUT /music/children/1 { "name": "gymbo", "content": "I hava a friend who loves smile, gymbo is his name", "language": "english", "length": "75" }
Response content contains index name, type name, ID value, version version version number (Optimistic Lock Control), result type (create/update/deleted three), shard information, etc. If the index does not exist when added, it will be automatically created. Index name is the one specified at request, field type inside the document, according to the automatic mapping type defined by elastic search, and each fiElds are indexed upside down so they can be searched.
Why is the data not equal between total and success? When a document is added, documents are written to primary shard and replica shard, respectively, but since there is only one node and replica is not started, the total number of writes is 2. Primary shard succeeds with a quantity of 1.failed records only primary shard write failures.
The response is as follows:
{ "_index": "music", "_type": "children", "_id": "1", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 0, "_primary_term": 1 }
- Modify song There are two ways to modify a document: incremental, modifying only the specified field, or replacing the document in its entirety, replacing all the original information
- Incrementally modify the length value, noting that it is a POST request with _update at the end and doc as a fixed write, with the following requests:
POST /music/children/1/_update { "doc": { "length": "76" } }
Response:
{ "_index": "music", "_type": "children", "_id": "1", "_version": 2, "result": "updated", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 1, "_primary_term": 1 }
- Replace the document in its entirety with the following request:
PUT /music/children/1 { "name": "gymbo", "content": "I hava a friend who loves smile, gymbo is his name", "language": "english", "length": "77" }
Response:
{ "_index": "music", "_type": "children", "_id": "1", "_version": 3, "result": "updated", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 2, "_primary_term": 1 }
Careful children's shoes will do. Replacing the grammar of a document in its entirety is the same as creating an index, right!Is the same syntax, whether to create or update depends on whether the ID storage above does not exist, does not exist to create, does exist to update, and successfully updates version+1, but there is a disadvantage of this kind of full replacement, it must take complete attributes, otherwise the attributes are not declared.
Think of a scenario where you want to use this syntax: GET all the attributes first, then update the attributes to be updated, and then call the updated syntax of the full replacement.There are actually two reasons for this: complex operation, query before update, and large message size (relative to incremental update).Therefore, enterprise R&D generally uses incremental methods to make document updates.
- Query Songs
Query statement: GET/music/children/1
_source is the content of the JSON, the result of the query:
{ "_index": "music", "_type": "children", "_id": "1", "_version": 1, "found": true, "_source": { "name": "gymbo", "content": "I hava a friend who loves smile, gymbo is his name", "language": "english", "length": "75" } }
- Delete song
Delete statement: DELETE/music/children/1
Response results:
{ "_index": "music", "_type": "children", "_id": "1", "_version": 4, "result": "deleted", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 3, "_primary_term": 1 }
bulk batch
The additions and deletions mentioned in the previous section are for a single document, and Elasticsearch also has a batch command that can be executed in bulk.
- Basic syntax examples for bulk Or use the above nursery songs as the background of the case
POST /music/children/_bulk {"index":{"_id":"1"}} {"name": "gymbo", "content": "I hava a friend who loves smile, gymbo is his name", "language": "english", "length": "75"} {"create":{"_id":"2"}} {"name": "wake me, shark me", "content": "don't let me sleep too late", "language": "english", "length": "55"} { "update": {"_id": "2", "retry_on_conflict" : 3} } { "doc" : {"content" : "don't let me sleep too late, gonna get up brightly early in the morning"} } { "delete": {"_id": "3" }}
Multiple commands can be executed together. If there are different indexes and types within a bulk request, indexes and types can also be written in body json, with two JSON strings for each operation:
{"action": {"metadata"}} {"data"}
delete exception, it only needs one json string
There are several types of action s:
- index: Normal PUT operation, document is created when ID does not exist, full replacement when ID exists
- Create: Force creation, equivalent to the PUT/index/type/id/_create command
- update: incremental modification operation performed
- Delete: delete document action
-
bulk considerations bulk api has strict syntax requirements, it cannot wrap lines within each json string, and there must be a line wrap between each json string, otherwise syntax errors will be reported. Since bulk is executed in batches with several commands, what should I do when it encounters errors?Will it interrupt? If there is a command execution error in the bulk request, it will skip directly and proceed to the next one. At the same time, the results of each command will be displayed separately in the response message. The correct result will be displayed, and the error will be prompted with the error log accordingly.
-
bulk performance issues Since bulk is batch processing, there must be some relationship between bulk size and optimal performance. The memory requested by bulk is loaded into memory first. The bulk request depends on the number of command bars and the content of each command. An example diagram of the relationship between bulk and performance (expressing concepts, data is not referenced) is as follows:
The goal of bulk performance optimization is to find this inflection point, and you need to try an optimal size repeatedly, depending on the specific business data characteristics and the amount of concurrency. Common settings range from 1000 to 5000, and the size of bulk requests is controlled from 5 to 15MB (for reference only).
Summary
This article gives a brief introduction to the data format of the document, along with an explanation of the criteria for determining the red, yellow and green states of the Elasticsearch cluster. The focus is on the small CRUD cases and the bulk batch processing examples demonstrated on the kibana platform, which are the most basic and can take a little more time to become familiar with.
Focus on Java high-concurrency, distributed architecture, more technology dry goods to share and learn from, please follow Public Number: Java Architecture Community