III-3. Interaction between HBase and Hive

Posted by abitshort on Sun, 28 Nov 2021 11:10:16 +0100

III-3. Interaction between HBase and Hive

3.1 comparison between HBase and Hive

[Hive]

OutlineElaborate
1. Data warehouseThe essence of Hive is actually equivalent to making a bijection relationship between the files already stored in HDFS in Mysql to
It is convenient to use HQL to manage queries.
2. Used for data analysis and cleaningHive is suitable for offline data analysis and cleaning with high delay.
3. MapReduce based on HDFSThe data stored in Hive is still on the DataNode, and the HQL statement written will eventually be converted into MapReduce code
that 's ok.

[HBse]

OutlineElaborate
1. DatabaseIt is a non relational database for column family storage.
2. It is used to store structured and unstructured dataIt is applicable to the storage of single table non relational data. It is not suitable for association query, such as JOIN and so on.
3. Based on HDFSThe embodiment of data persistent storage is HFile, which is stored in DataNode and represented as region by ResionServer
Manage in an open manner.
4. Low delay, access to online servicesIn the face of a large amount of enterprise data, HBase can store a large amount of data in a single table and provide efficient data access
Speed.

3.2 integrated use of HBase and Hive

[Environmental preparation]

  • If some data is stored in hbase and we want to analyze the data through sql, then integrating hbase with hive is a good method. In essence, hive acts as the client of hbase.
  • In the future, we need to create a table in hive, and this table will also appear in hbase, so we need hive to hold the jar package dependency of hbase

If A wants to use B, A must have B's jar package. For example, in Hive installation, Hive needs to use MySQL database, so put the jdbc driver package in the lib folder

  1. Through soft link, let the Lib directory of HBase hold a mapping to ${HBASE_HOME}/hbase-2.3.7/lib directory. First, cd to / opt/module/hive-3.1.2/lib directory and execute the soft link command
ln -s /opt/module/hbase-2.3.7/lib
  • If the command ll is sent in the current directory, the following contents will appear, indicating that the link is successful
  1. When other components access HBase, the first thing to access is zookeeper, and hive is no exception. Therefore, the hive-site.xml file of hive also introduces connection zk related configurations:
<!---to configure zookeeper Address for access hbase-->
<property>
<name>hive.zookeeper.quorum</name>
<value>bigdata01,bigdata02,bigdata03</value>
<description>The list of ZooKeeper servers to talk to. This is
only needed for read/write locks.</description>
</property>

<property>
<name>hive.zookeeper.client.port</name>
<value>2181</value>
<description>The port of ZooKeeper servers to talk to. This is
only needed for read/write locks.</description>
</property>

3.3. Case 1: insert data into Hive and synchronize it to HBase

[Objective]

  • Create Hive table, associate HBase table, and insert data into Hive table while affecting HBase table.

[method]

  1. Create a table in HIVE and add related statements related to HBse
CREATE TABLE hive_hbase_emp_table(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES("hbase.columns.mapping"=
":key,info:ename,info:job,info:mgr,info:hiredate,info:sal,info:comm,info:deptno") 
TBLPROPERTIES ("hbase.table.name" = "hbase_emp_table");
  • After completing the above command in hive, the table should be included in hbase and hive;
  1. Create a temporary intermediate table in Hive to load the data in the file. Data cannot be directly loaded into the table of HBase associated with Hive
CREATE TABLE emp(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int)
row format delimited fields terminated by '\t';
  1. load data to Hive intermediate table
hive> load data local inpath '/opt/module/data/hive-data/input/emp.txt' into table emp;
  1. Use the insert command to import the data in the intermediate table into the table associated with Hive Hbase
hive> insert into table hive_hbase_emp_table select * from emp;
  1. Check whether the Hive and associated HBase tables have successfully inserted data synchronously
#Hive: 
hive> select * from hive_hbase_emp_table;

#HBase: 
Hbase> scan 'hbase_emp_table'

3.4 case 2,

Objective: a table has been stored in hbase_emp_table, and then create an external table in Hive to
HBase in association hbase_emp_table, so that Hive can be used to analyze the number in HBase
According to.
Note: case 2 follows case 1 closely, so please complete case 1 before completing this case.

  1. Creating external tables in Hive
CREATE EXTERNAL TABLE relevance_hbase_emp(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int)
STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" =
":key,info:ename,info:job,info:mgr,info:hiredate,info:sal,info:co
mm,info:deptno")
TBLPROPERTIES ("hbase.table.name" = "hbase_emp_table");
  1. After association, you can use Hive function to perform some analysis operations
hive (default)> select * from relevance_hbase_emp;
no")
TBLPROPERTIES ("hbase.table.name" = "hbase_emp_table");
  1. After association, you can use Hive function to perform some analysis operations
hive (default)> select * from relevance_hbase_emp;

Topics: Big Data Hadoop HBase