Big data ELK in 2021: Elasticsearch SQL order statistical analysis case

Posted by reyes99 on Mon, 06 Dec 2021 00:49:40 +0100

The most detailed big data ELK article series in the whole network. It is strongly recommended to collect and pay attention!  

The new articles have listed the historical article directory to help you review the previous knowledge focus.

catalogue

Order statistics and analysis cases

1, Case introduction

2, Create index

3, Import test data

4, Count the order quantity of different payment methods

1. It is implemented using JSON DSL

2. SQL implementation based on elasticsearch

5, Count the order quantity of different methods based on JDBC

6, Count the number of orders with different payment methods and sort them in reverse order

7, Only the orders with different payment methods in paid status are counted

8, Count the total amount of orders in different statuses and the maximum and minimum order amounts in different payment methods

9, Elasticsearch SQL currently has some limitations

10, Common problem handling

1,elasticsearch.keystore  AccessDeniedException

Order statistics and analysis cases

1, Case introduction

There are the following data sets:

Order ID

Order status

Payment amount

Payment method ID

User ID

Operation time

Commodity classification

id

status

pay_money

payway

userid

operation_date

category

1

Submitted

4070

1

4944191

2020-04-25 12:09:16

mobile phone;

2

Completed

4350

1

1625615

2020-04-25 12:09:37

Household Electric Appliances;; computer;

3

Submitted

6370

3

3919700

2020-04-25 12:09:39

men's wear; Men's Shoes;

4

Paid

6370

3

3919700

2020-04-25 12:09:44

men's wear; Men's Shoes;

We need to use the aggregation statistics function in Elasticsearch to realize some index statistics based on data by.

2, Create index

PUT /order_idx/
{
    "mappings": {
        "properties": {
            "id": {
                "type": "keyword",
                "store": true
            },
            "status": {
                "type": "keyword",
                "store": true
            },
            "pay_money": {
                "type": "double",
                "store": true
            },
            "payway": {
                "type": "byte",
                "store": true
            },
            "userid": {
                "type": "keyword",
                "store": true
            },
            "operation_date": {
                "type": "date",
                "format": "yyyy-MM-dd HH:mm:ss",
                "store": true
            },
            "category": {
                "type": "keyword",
                "store": true
            }
        }
    }
}

3, Import test data

  • Upload order in data_ Data.json data file to Linux
  • bulk import command
curl -H "Content-Type: application/json" -XPOST "node1:9200/order_idx/_bulk?pretty&refresh" --data-binary "@order_data.json"

4, Count the order quantity of different payment methods

1. It is implemented using JSON DSL

In this way, the JSON based DSL supported by Elasticsearch is used to realize aggregation statistics.

GET /order_idx/_search
{
    "size": 0,
    "aggs": {
        "group_by_state": {
            "terms": {
                "field": "payway"
            }
        }
    }
}

Statistical results:

    "aggregations": {
        "group_by_state": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": 2,
                    "doc_count": 1496
                },
                {
                    "key": 1,
                    "doc_count": 1438
                },
                {
                    "key": 3,
                    "doc_count": 1183
                },
                {
                    "key": 0,
                    "doc_count": 883
                }
            ]
        }
    }

This way of analysis is more troublesome. If we all write this way to analyze data in the future, it will be unbearable. Therefore, if Elasticsearch wants to enter the field of OLAP, it must support SQL and be able to use SQL for statistics and analysis

 

2. SQL implementation based on elasticsearch

GET /_sql?format=txt
{
    "query": "select payway, count(*) as order_cnt from order_idx group by payway"
}

This way should be more intuitive and concise

V. count the order quantity of different methods based on JDBC

Elastic search also provides a JDBC based way to access data. We can operate Elasticsearch like MySQL. The steps are as follows:

1. Add the following image repository in pom.xml

<repositories>
  <repository>
    <id>elastic.co</id>
    <url>https://artifacts.elastic.co/maven</url>
  </repository>
</repositories>

2. Import Elasticsearch JDBC driver Maven dependency

<dependency>
  <groupId>org.elasticsearch.plugin</groupId>
  <artifactId>x-pack-sql-jdbc</artifactId>
  <version>7.6.1</version>
</dependency>

3. Drive

org.elasticsearch.xpack.sql.jdbc.EsDriver

4,JDBC URL

jdbc:es:// http:// host:port

5. Enable the X-pack advanced function trial. If the trial is not enabled, the following error will be reported

current license is non-compliant for [jdbc]

On node1 node:

curl http://node1:9200/_license/start_trial?acknowledge=true -X POST
{"acknowledged":true,"trial_was_started":true,"type":"trial"}

The probation period is 30 days

Reference code:

/**
 * Accessing Elasticsearch based on JDBC
 */
public class ElasticJdbc {

    public static void main(String[] args) throws Exception {
        Class.forName("org.elasticsearch.xpack.sql.jdbc.EsDriver");

        Connection connection = DriverManager.getConnection("jdbc:es://http://node1:9200");
        PreparedStatement ps = connection.prepareStatement("select payway, count(*) as order_cnt from order_idx group by payway");
        ResultSet resultSet = ps.executeQuery();

        while(resultSet.next()) {
            int payway = resultSet.getInt("payway");
            int order_cnt = resultSet.getInt("order_cnt");
            System.out.println("Payment method: " + payway + " Order quantity: " + order_cnt);
        }

        resultSet.close();
        ps.close();
        connection.close();
    }
}

Note: if you cannot download dependencies in IDEA, please refer to the following operations:

In the file -- > settings of Idea, set Maven's importing and Runner parameters and ignore the certificate check. (the solution principle under Eclipse is similar, set Maven runtime parameters), and try to manually execute Maven compile.

Specific parameters: - Dmaven.multiModuleProjectDirectory=$MAVEN_HOME -Dmaven.wagon.http.ssl.insecure=true -Dmaven.wagon.http.ssl.allowall=true -Dmaven.wagon.http.ssl.ignore.validity.dates=true

 

 

6, Count the number of orders with different payment methods and sort them in reverse order

GET /_sql?format=txt
{
    "query": "select payway, count(*) as order_cnt from order_idx group by payway order by order_cnt desc"

7, Only the orders with different payment methods in paid status are counted

GET /_sql?format=txt
{
    "query": "select payway, count(*) as order_cnt from order_idx where status = 'Paid' group by payway order by order_cnt desc"
}

8, Count the total amount of orders in different statuses and the maximum and minimum order amounts in different payment methods

Count the total amount of orders in different statuses and the maximum and minimum order amounts in different payment methods

GET /_sql?format=txt
{
    "query": "select userid, count(1) as cnt, sum(pay_money) as total_money from order_idx group by userid"
}

9, Elasticsearch SQL currently has some limitations

At present, Elasticsearch SQL still has some limitations. For example, JOIN and complex subqueries are not supported. Therefore, some relatively complex functions have to be realized by means of DSL.

10, Common problem handling

1,​​​​​​​elasticsearch.keystore  AccessDeniedException

Exception in thread "main" org.elasticsearch.bootstrap.BootstrapException: java.nio.file.AccessDeniedException: /export/server/es/elasticsearch-7.6.1/config/elasticsearch.keystore
Likely root cause: java.nio.file.AccessDeniedException: /export/server/es/elasticsearch-7.6.1/config/elasticsearch.keystore
        at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:90)
        at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
        at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
        at java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:219)
        at java.base/java.nio.file.Files.newByteChannel(Files.java:374)
        at java.base/java.nio.file.Files.newByteChannel(Files.java:425)
        at org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:77)
        at org.elasticsearch.common.settings.KeyStoreWrapper.load(KeyStoreWrapper.java:219)
        at org.elasticsearch.bootstrap.Bootstrap.loadSecureSettings(Bootstrap.java:234)
        at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:305)
        at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:170)
        at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:161)
        at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86)
        at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:125)
        at org.elasticsearch.cli.Command.main(Command.java:90)
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:126)
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92)

Solution:

Set / export/server/es/elasticsearch-7.6.1/config/elasticsearch.keystore owner to lanson

chown lanson /export/server/es/elasticsearch-7.6.1/config/elasticsearch.keystore
  • 📢 Blog home page: https://lansonli.blog.csdn.net
  • 📢 Welcome to praise 👍 Collection ⭐ Leaving a message. 📝 Please correct any errors!
  • 📢 This article was originally written by Lansonli and started on CSDN blog 🙉
  • 📢 Big data series articles will be updated every day. When you stop to rest, don't forget that others are still running. I hope you will seize the time to study and make every effort to go to a better life ✨

Topics: ElasticSearch ELK