Flink (56): FlinkSQL integration Hive of Flink advanced features

Posted by sd9sd on Wed, 02 Feb 2022 20:39:17 +0100

catalogue

0. Links to related articles

1. Introduction to flinksql integration Hive

2. Basic ways of integrating Hive

2.1. Persistent metadata

2.2. Use Flink to read and write Hive's table

3. Preparation

4. SQL CLI

5. Code demonstration

0. Links to related articles

Flink article summary

1. Introduction to flinksql integration Hive

Introduction to the official website: Apache Flink 1.12 Documentation: Hive

Zhihu case: Quick start of Flink integrated Hive -- with flink1 12 as an example - Zhihu

Using Hive to build data warehouse has become a common solution. At present, some common big data processing engines are compatible with Hive without exception. Flink supports integrated Hive from 1.9, but version 1.9 is beta and is not recommended for use in production environments. At flink1 In version 10, it marks the completion of the integration of Blink, and the integration of Hive also meets the requirements of production level. It is worth noting that different versions of Flink have different integration for Hive. Next, we will use the latest flink1 Take version 12 as an example to realize Flink integration Hive.

2. Basic ways of integrating Hive

2.1. Persistent metadata

Flink uses Hive's MetaStore as a persistent Catalog. We can store Flink metadata in different sessions in Hive Metastore through HiveCatalog. For example, we can use HiveCatalog to store its Kafka data source table in Hive Metastore, so that the metadata information of the table will be persisted to the metadata database corresponding to Hive Metastore, and we can reuse them in subsequent SQL queries.

2.2. Use Flink to read and write Hive's table

Flink opens up the integration with Hive. Just like using SparkSQL or Impala to operate the data in Hive, we can use Flink to directly read and write the tables in Hive. The design of HiveCatalog provides good compatibility with Hive. Users can access their existing Hive tables "out of the box". There is no need to modify the existing history Metastore, or change the data location or partition of the table.

3. Preparation

1. Add hadoop_classpath, add the following configuration in the / etc/profile environment variable

# Edit / etc/profile file
vim /etc/profile

# Add the following configuration
export HADOOP_CLASSPATH=`hadoop classpath`

# Refresh configuration
source /etc/profile

2. Download the jar and upload it to the flink/lib directory

Download website: Apache Flink 1.12 Documentation: Hive

3. Modify hive configuration

# Edit the following file
vim /export/server/hive/conf/hive-site.xml

# Add the following configuration
<property>
        <name>hive.metastore.uris</name>
        <value>thrift://node3:9083</value>
</property>

4. Restart hive metadata service

nohup /export/server/hive/bin/hive --service metastore &

4. SQL CLI

1. Modify the SQL client defaults of the flinksql cluster Yaml configuration file, and add the following configuration

# Modify the following configuration file
vim /export/server/flink/conf/sql-client-defaults.yaml

# Add the following configuration
catalogs:
   - name: myhive
     type: hive
     hive-conf-dir: /export/server/hive/conf
     default-database: default

2. Start the flink cluster

/export/server/flink/bin/start-cluster.sh

3. Start the flynk SQL client

/export/server/flink/bin/sql-client.sh embedded

4. Execute the following sql

show catalogs;

use catalog myhive;

show tables;

select * from person;

5. Code demonstration

import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.TableEnvironment;
import org.apache.flink.table.api.TableResult;
import org.apache.flink.table.catalog.hive.HiveCatalog;

/**
 * Desc
 */
public class HiveDemo {
    public static void main(String[] args){
        EnvironmentSettings settings = EnvironmentSettings.newInstance().useBlinkPlanner().build();
        TableEnvironment tableEnv = TableEnvironment.create(settings);

        String name            = "myhive";
        String defaultDatabase = "default";
        String hiveConfDir = "./conf";

        HiveCatalog hive = new HiveCatalog(name, defaultDatabase, hiveConfDir);
        //Register catalog
        tableEnv.registerCatalog("myhive", hive);
        //Use registered catalog
        tableEnv.useCatalog("myhive");

        //Write data to Hive table
        String insertSQL = "insert into person select * from person";
        TableResult result = tableEnv.executeSql(insertSQL);

        System.out.println(result.getJobClient().get().getJobStatus());
    }
}

Note: this blog is adapted from the 2020 New Year video of a horse - > Website of station B

Note: links to other related articles go here - > Flink article summary

Topics: Big Data hive flink