1, The difference between the three
-
from size:
- deep pagination occurs when the page is deeply paged or the size is very large. And the self-protection mechanism of es is max_result_window is 10000. When the number of queries exceeds 10000, an error will be reported
- The implementation principle of this query is similar to the limit in mysql. For example, to query the 10001 data, you need to take out the first 1000 and filter them to get the data finally. (poor performance, simple implementation, suitable for a small amount of data)
-
search after
- search_ The disadvantage of after is that it can't jump to paging randomly. It can only turn back page by page (when new data comes in, it can also be queried in real time), and at least one unique non duplicate field needs to be specified for sorting (generally _id and time fields)
- When using search_after, the from value must be set to 0 or - 1
-
scroll
- Efficient rolling query. The first query will save a historical snapshot and cursor (scroll_id) in memory to record the termination position of the current message query. The next query will be based on the cursor for consumption (good performance, not real-time, generally used for mass data export or index reconstruction)
2, Code test class
-
from size
package com.example.es.test; import org.apache.http.HttpHost; import org.elasticsearch.action.search.SearchRequest; import org.elasticsearch.action.search.SearchResponse; import org.elasticsearch.client.RequestOptions; import org.elasticsearch.client.RestClient; import org.elasticsearch.client.RestHighLevelClient; import org.elasticsearch.search.SearchHit; import org.elasticsearch.search.builder.SearchSourceBuilder; import org.elasticsearch.search.sort.SortBuilders; import org.elasticsearch.search.sort.SortOrder; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.ArrayList; import java.util.List; import java.util.Map; /** * @author * @Description es From size usage of * @date 2022/01/26 10:04 */ public class ESTest_from_size { public static final Logger logger = LoggerFactory.getLogger(ESTest_searchAfter.class); public static void main(String[] args) throws Exception{ long startTime = System.currentTimeMillis(); // Create ES client RestHighLevelClient esClient = new RestHighLevelClient( RestClient.builder(new HttpHost("localhost", 9200, "http")) ); // 1. Create searchRequest SearchRequest searchRequest = new SearchRequest("audit2"); // 2. Specify query criteria SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();//Track must be added_ total_ Hits, or only 10000 will be displayed // The first page on the page is equivalent to 0 in es sourceBuilder.from(0); // How many pieces of data per page sourceBuilder.size(1000); // Set unique sort value positioning sourceBuilder.sort(SortBuilders.fieldSort("operationtime").order(SortOrder.DESC)); //Add the sourceBuilder object to the search request searchRequest.source(sourceBuilder); // Send request SearchResponse searchResponse = esClient.search(searchRequest, RequestOptions.DEFAULT); SearchHit[] hits = searchResponse.getHits().getHits(); List<Map<String, Object>> result = new ArrayList<>(); if (hits != null && hits.length > 0) { for (SearchHit hit : hits) { // Get required data Map<String, Object> sourceAsMap = hit.getSourceAsMap(); result.add(sourceAsMap); } } logger.info("The number of data queried is:{}", result.size()); // Close client esClient.close(); logger.info("Running time: " + (System.currentTimeMillis() - startTime) + "ms"); } }
Operation results:
10:08:40.466 [main] INFO com.example.es.test.ESTest_searchAfter - The number of data queried is 1000 10:08:40.474 [main] INFO com.example.es.test.ESTest_searchAfter - Running time: 1506ms
Phenomenon:
If the data queried by from size exceeds 10000, an error will be reported
2,search after
package com.example.es.test; import org.apache.http.HttpHost; import org.elasticsearch.action.search.SearchRequest; import org.elasticsearch.action.search.SearchResponse; import org.elasticsearch.client.RequestOptions; import org.elasticsearch.client.RestClient; import org.elasticsearch.client.RestHighLevelClient; import org.elasticsearch.search.SearchHit; import org.elasticsearch.search.builder.SearchSourceBuilder; import org.elasticsearch.search.sort.SortBuilders; import org.elasticsearch.search.sort.SortOrder; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.ArrayList; import java.util.List; import java.util.Map; /** * @author * @Description es Search for_ After method * @date 2022/01/11 14:04 */ public class ESTest_searchAfter { public static final Logger logger = LoggerFactory.getLogger(ESTest_searchAfter.class); public static void main(String[] args) throws Exception{ long startTime = System.currentTimeMillis(); // Create ES client RestHighLevelClient esClient = new RestHighLevelClient( RestClient.builder(new HttpHost("localhost", 9200, "http")) ); // 1. Create searchRequest SearchRequest searchRequest = new SearchRequest("audit2"); // 2. Specify query criteria SearchSourceBuilder sourceBuilder = new SearchSourceBuilder().trackTotalHits(true);//Track must be added_ total_ Hits, or only 10000 will be displayed //Set the number of data queried per page sourceBuilder.size(1000); // Set unique sort value positioning sourceBuilder.sort(SortBuilders.fieldSort("operationtime").order(SortOrder.DESC));//Multi condition query //Add the sourceBuilder object to the search request searchRequest.source(sourceBuilder); // Send request SearchResponse searchResponse = esClient.search(searchRequest, RequestOptions.DEFAULT); SearchHit[] hits1 = searchResponse.getHits().getHits(); List<Map<String, Object>> result = new ArrayList<>(); if (hits1 != null && hits1.length > 0) { do { for (SearchHit hit : hits1) { // Get required data Map<String, Object> sourceAsMap = hit.getSourceAsMap(); result.add(sourceAsMap); } // Get the last sort value sort, which is used to record the data retrieval from this place next time SearchHit[] hits = searchResponse.getHits().getHits(); Object[] lastNum = hits[hits.length - 1].getSortValues(); // Set the last sort value of searchAfter sourceBuilder.searchAfter(lastNum); searchRequest.source(sourceBuilder); // Make the next query searchResponse = esClient.search(searchRequest, RequestOptions.DEFAULT); } while (searchResponse.getHits().getHits().length != 0); } logger.info("The number of data queried is:{}", result.size()); // Close client esClient.close(); logger.info("Running time: " + (System.currentTimeMillis() - startTime) + "ms"); } }
Operation results:
16:11:44.057 [main] INFO com.example.es.test.ESTest_searchAfter - The number of data queried is 64000 16:11:44.061 [main] INFO com.example.es.test.ESTest_searchAfter - Running time: 20979ms
Phenomenon: audit2 there are 69873 pieces of data in the index. The information printed on the console is printed every 1000 queries. Finally, 64000 records are queried, and 5873 pieces of data are lost. In addition, if the size exceeds 10000, an error will also be reported.
My own question: since search after can't skip page query, it can only be queried page by page, isn't the front end calling this interface and the back end still returning all the data. If the front end is set to scroll down for query, then the scroll wheel will scroll down a few pages and the back end will return a few pages of data, won't the back end save more time for query. At present, search after still queries the data at one time, but internally, it is queried page by page, and the final display is all the data. I have questions about how I can interface with the front end.
3,scroll
package com.example.es.test; import org.apache.http.HttpHost; import org.elasticsearch.action.search.*; import org.elasticsearch.client.RequestOptions; import org.elasticsearch.client.RestClient; import org.elasticsearch.client.RestHighLevelClient; import org.elasticsearch.common.unit.TimeValue; import org.elasticsearch.index.query.QueryBuilders; import org.elasticsearch.search.SearchHit; import org.elasticsearch.search.builder.SearchSourceBuilder; import org.elasticsearch.search.sort.SortBuilders; import org.elasticsearch.search.sort.SortOrder; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.ArrayList; import java.util.List; import java.util.Map; /** * @author * @Description java Realize scroll scrolling query * @date 2021/12/08 14:09 */ public class ESTest_Scroll { public static final Logger logger = LoggerFactory.getLogger(ESTest_Scroll.class); public static void main(String[] args) throws Exception{ long startTime = System.currentTimeMillis(); // Create ES client RestHighLevelClient esClient = new RestHighLevelClient( RestClient.builder(new HttpHost("localhost", 9200, "http")) ); // 1. Create searchRequest SearchRequest searchRequest = new SearchRequest("audit2"); // 2. Specify scroll information searchRequest.scroll(TimeValue.timeValueMinutes(1L)); // 3. Specify query criteria SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); searchSourceBuilder.size(1000); searchSourceBuilder.sort(SortBuilders.fieldSort("operationtime").order(SortOrder.DESC));//Multi condition query searchRequest.source(searchSourceBuilder); //4. Get the returned result scrollId, source SearchResponse searchResponse = esClient.search(searchRequest, RequestOptions.DEFAULT); //Initialize the search context by sending an initial search request String scrollId = searchResponse.getScrollId(); SearchHit[] searchHits = searchResponse.getHits().getHits(); List<Map<String, Object>> result = new ArrayList<>(); for (SearchHit hit: searchHits) { result.add(hit.getSourceAsMap()); } // java is the same. We need to query twice. First, find out our home page // After the query, we need to get his id // Then use his id to query his next page while (true) { //5. Loop - create SearchScrollRequest create a new search scroll request and save the last returned scroll identifier and scroll interval // Get scrollId to query the next page SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId); //6. Specifies the lifetime of the scrollId scrollRequest.scroll(TimeValue.timeValueMinutes(1L)); //7. Execute the query to get the returned results SearchResponse scrollResp = esClient.scroll(scrollRequest, RequestOptions.DEFAULT); //8. Judge whether the data is queried and output SearchHit[] hits = scrollResp.getHits().getHits(); //Cycle output next page if (hits != null && hits.length > 0) { for (SearchHit hit : hits) { result.add(hit.getSourceAsMap()); } } else { //9. Judge that no data is found and exit the cycle break; } } //After checking, we delete the id stored in the cache. After scrolling, clear the scrolling context //10. Create ClearScrollRequest ClearScrollRequest clearScrollRequest = new ClearScrollRequest(); //11. Specify scrollId clearScrollRequest.addScrollId(scrollId); //12. Delete scrollId ClearScrollResponse clearScrollResponse = esClient.clearScroll(clearScrollRequest, RequestOptions.DEFAULT); //13. Output results boolean succeeded = clearScrollResponse.isSucceeded(); logger.info("delete scrollId: {}", succeeded); logger.info("Total number of queries:{}", result.size()); // Close client esClient.close(); logger.info("Running time: " + (System.currentTimeMillis() - startTime) + "ms"); } }
Operation results:
16:20:54.794 [main] INFO com.example.es.test.ESTest_Scroll - delete scrollId: true 16:20:54.795 [main] INFO com.example.es.test.ESTest_Scroll - Total number of queries: 69873 16:20:54.797 [main] INFO com.example.es.test.ESTest_Scroll - Running time: 5716ms
Phenomenon:
audit2 the index contains a total of 69873 data, and 69873 records are finally queried, none of which is lost. In addition, if the size exceeds 10000, an error will also be reported. It's strange that search after will lose data, while a record of scroll is not lost.