HBase integrates Phoenix -- dark horse

Posted by dotbands on Fri, 18 Feb 2022 01:36:03 +0100

Phoenix

What is Phoenix

  • Phoenix is an open source SQL Engine Based on HBase. You can use standard JDBC API instead of HBase client API to create tables, insert data and query your HBase data
  • Phoenix is written entirely in Java as a JDBC Driver embedded in HBase. The Phoenix query engine converts the SQL query into one or more HBase scans and choreographs the execution to generate a standard JDBC result set. Using HBase API, coprocessor and custom filter directly, the performance energy level is milliseconds for simple queries and seconds for millions of rows
  • Phoenix performance
    • Phoenix is an SQL Engine built on HBase
    • Phoenix enables high-performance operation of HBase in the following ways
      • Compile your SQL query as a scan statement of native HBase
      • key to detect the best start and end of scan statement
      • Orchestrate your scan statements so that they execute in parallel
      • Push the predicate of your WHERE clause to the server filter for processing
      • Execute aggregate query through server hook (called coprocessor)
      • The secondary index is implemented to improve the performance of non primary key field query
      • Statistics related data to improve the level of parallelization and help select the best optimization scheme
      • Skip scan filters to optimize IN, LIKE, OR queries
      • Optimize the primary key to evenly distribute the write pressure

Official website address:

http://phoenix.apache.org/

Installation and deployment of Phoenix

1. Preparatory work

  • Install ZK cluster, hadoop cluster and Hbase cluster in advance

2. Installation package
Download address: https://mirrors.cnnic.cn/apache/phoenix/apache-phoenix-4.14.0-cdh5.14.2/bin/

Materials \ installation \ apache-phoenix-4.14.0-hbase-1.1-bin tar. gz

3. Upload and unzip

  • Upload the corresponding installation package to a directory of one of the servers in the corresponding Hbase cluster and unzip it
tar -xvzf apache-phoenix-4.14.0-HBase-1.1-bin.tar.gz -C ../servers/

4. Copy the JAR package required by Phoenix to integrate HBase

Add phoenix-4.14.0-hbase-1.1-server. In the Phoenix directory jar(phoenix-4.14.0-cdh5.14.2-server.jar),phoenix-core-4.14.0-HBase-1.1. Copy the jar (phoenix-core-4.14.0-cdh5.14.2.jar) to the lib directory of each HBase

scp phoenix-4.14.0-HBase-1.1-server.jar phoenix-core-4.14.0-HBase-1.1.jar node1:/export/servers/hbase-1.1.1/lib
scp phoenix-4.14.0-HBase-1.1-server.jar phoenix-core-4.14.0-HBase-1.1.jar node2:/export/servers/hbase-1.1.1/lib
scp phoenix-4.14.0-HBase-1.1-server.jar phoenix-core-4.14.0-HBase-1.1.jar node3:/export/servers/hbase-1.1.1/lib

5. Configure HADOOP and HBASE in Phoenix

The hbase configuration file hbase site XML, core site.xml under hadoop/etc/hadoop xml ,hdfs-site.xml under phoenix/bin / to replace the original configuration file of phoenix

# Enter hbase bin directory
cd /export/servers/apache-phoenix-4.14.0-HBase-1.1-bin/bin

# Back up the original HBase site XML file
mv hbase-site.xml hbase-site.xml.bak

ln -s $HBASE_HOME/conf/hbase-site.xml .
ln -s $HADOOP_HOME/etc/hadoop/core-site.xml .
ln -s $HADOOP_HOME/etc/hadoop/hdfs-site.xml .

6. Restart hbase cluster to make Phoenix jar package take effect

7. Verify success

./sqlline.py node1:2181

The following interface appears, indicating that the startup is successful

Setting property: [incremental, false]
Setting property: [isolation, TRANSACTION_READ_COMMITTED]
issuing: !connect jdbc:phoenix:node1:2181 none none org.apache.phoenix.jdbc.PhoenixDriver
Connecting to jdbc:phoenix:node1:2181
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/export/servers/apache-phoenix-4.14.0-HBase-1.1-bin/phoenix-4.14.0-HBase-1.1-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/export/servers/hadoop-2.6.0-cdh5.14.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
19/10/18 09:58:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Connected to: Phoenix (version 4.14)
Driver: PhoenixEmbeddedDriver (version 4.14)
Autocommit status: true
Transaction isolation: TRANSACTION_READ_COMMITTED
Building list of tables and columns for tab-completion (set fastconnect to true to skip)...
133/133 (100%) Done
Done
sqlline version 1.2.0

8. View which tables are currently available

  • Input! Tables view which tables are available
  • The red box part is the user created table, while the others are Phoenix system tables, which maintain the metadata information of the user table
+--------------+-------------+---------------+-+-----------------+---------------+---------+
| TABLE_SCHEM  | TABLE_NAME  |  TABLE_TYPE   | | IMMUTABLE_ROWS  | SALT_BUCKETS  | MULTI_T |
+--------------+-------------+---------------+-+-----------------+---------------+---------+
| SYSTEM       | CATALOG     | SYSTEM TABLE  | | false           | null          | false   |
| SYSTEM       | FUNCTION    | SYSTEM TABLE  | | false           | null          | false   |
| SYSTEM       | LOG         | SYSTEM TABLE  | | true            | 32            | false   |
| SYSTEM       | SEQUENCE    | SYSTEM TABLE  | | false           | null          | false   |
| SYSTEM       | STATS       | SYSTEM TABLE  | | false           | null          | false   |
+--------------+-------------+---------------+-+-----------------+---------------+---------+

9. Exit Phoenix, enter! quit

Introduction to Phoenix

Demand 1:

  • Use SQL statements to create a user table in Phoenix. The user table has the following columns
IDfull nameAgeGenderaddress
1Zhang San30maleXicheng district
2Li Si20femaleMinhang District
  • Insert two pieces of data into the table, query the data, and view the data in HBase

Demand analysis:

  • Directly in Phoenix, use the create table syntax to create a table structure
  • Because the data ultimately needs to be saved in HBase, the column cluster name in HBase needs to be specified when creating a table

Reference code:

-- Create table
create table if not exists "user_info"(
    "id" varchar primary key,
    "cf"."name" varchar,
    "cf"."age" integer,
    "cf"."sex" varchar,
    "cf"."address" varchar
);

-- New data
upsert into "user_info" values('1', 'Zhang San', 30, 'male', 'Xicheng District ');
upsert into "user_info" values('2', 'Li Si', 20, 'female', 'Minhang District, Shanghai');

Demand 2:

  • The age of the user whose modification id is 1 is 45
-- Modify data
upsert into "user_info"("id", "age") values('1', 35);

Demand 3:

  • Delete user data with id 2
-- Delete data
delete from "user_info" where "id" = '2';

Establish mapping with HBase table

There is already a table in HBase. You need to use Phoenix to establish a mapping with HBase, so as to operate HBase through Phoenix in the way of SQL.

Case:

1. In HBase, establish the mapping table of employee s - data preparation

create 'employee','company','family'

put 'employee','row1','company:name','ted'
put 'employee','row1','company:position','worker'
put 'employee','row1','family:tel','13600912345'

put 'employee','row2','company:name','michael'
put 'employee','row2','company:position','manager'
put 'employee','row2','family:tel','1894225698'

scan 'employee'

2. Create mapping view

  • There are tables in HBase. To create a mapping in Phoenix, you must use create view

  • Phoenix is case sensitive

  • All commands are capitalized

  • If the table name is not enclosed in double quotation marks, the created table name is capitalized regardless of whether you enter uppercase or lowercase

  • If you want to create a table name and field name that contain both uppercase and lowercase, enclose the table name or field name in double quotation marks

Open command line in Phoenix

CREATE VIEW IF NOT EXISTS "employee" (
    "rowid" VARCHAR NOT NULL PRIMARY KEY, 
    "company"."name" VARCHAR,
    "company"."position" VARCHAR,
    "family"."tel" VARCHAR
);

There are several points to note in this statement

  • IF NOT EXISTS ensures that if this table has been created, the configuration will not be overwritten

  • The field as rowkey is calibrated with PRIMARY KEY

  • Columnfamily. Is used for column clusters Columnname

  • After the establishment, query the data

3. Query all mapping table data

0: jdbc:phoenix:node01> SELECT * FROM "employee";
+-------+----------+-----------+--------------+-------+
|  no   |   name   | position  |     tel      |  age  |
+-------+----------+-----------+--------------+-------+
| row1  | ted      | worker    | 13600912345  | null  |
| row2  | michael  | manager   | 1894225698   | null  |
+-------+----------+-----------+--------------+-------+

4. Query the data of all employees whose position is' worker '

select * from "employee" where "position" = 'worker'

Using Phoenix to build secondary index to speed up query efficiency

  • HBase queries through rowkey, otherwise it must compare the values of each column row by row, that is, full table scanning
  • For tables with large amount of data, the performance of full table scanning is very poor
  • If you need to query data from multiple perspectives, it is impossible to use rowkey to query. This can be done using the secondary index
  • Phoenix provides support for HBase secondary index

Configure HBase to support Phoenix secondary index

1. In each HRegionServce HBase site XML add the following attributes

<property> 
  <name>hbase.regionserver.wal.codec</name> 
  <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value> 
</property>

2. Restart the HBase cluster to make the configuration effective

Creating secondary indexes using Phoenix

1. Index creation

create local index "idx_tel" on "employee"("family"."tel");

2. View the execution plan and check whether to query the secondary index

explain select * from "employee" where "name" = 'ted';
explain select  * from "employee" where "tel" = '13600912345';

3. Delete index

drop index "idx_tel" on "employee";

4. View all indexes on the table

!indexes "employee"

Create Phoenix mapping table for order details

Create mapping table

Implementation steps:

  • Create a mapping table in Phoenix
  • Realize various queries

1. Create table SQL statement:

create view "dwd_order_detail"(
    "rowid" varchar primary key,
    "detail"."ogId" varchar,
    "detail"."orderId" varchar,
    "detail"."goodsId" varchar,
    "detail"."goodsNum" varchar,
    "detail"."goodsPrice" varchar,
    "detail"."goodsName" varchar,
    "detail"."shopId" varchar,
    "detail"."goodsThirdCatId" varchar,
    "detail"."goodsThirdCatName" varchar,
    "detail"."goodsSecondCatId" varchar,
    "detail"."goodsSecondCatName" varchar,
    "detail"."goodsFirstCatId" varchar,
    "detail"."goodsFirstCatName" varchar,
    "detail"."areaId" varchar,
    "detail"."shopName" varchar,
    "detail"."shopCompany" varchar,
    "detail"."cityId" varchar,
    "detail"."cityName" varchar,
    "detail"."regionId" varchar,
    "detail"."regionName" varchar
);

2. Create an index in the following columns on the table

Listingexplain
goodsThirdCatNameThree level classification
goodsSecondCatNameSecondary classification
goodsFirstCatNamePrimary classification
cityNameCity name
regionNameRegion name

Reference code:

-- Create index
create local index "idx_dwd_order_detail" on "dwd_order_detail"("detail"."goodsThirdCatName", "detail"."goodsSecondCatName", "detail"."goodsFirstCatName", "detail"."cityName", "detail"."regionName");

explain select * from "dwd_order_detail" where "goodsThirdCatName" = 'Other fruits and vegetables' and "cityName" = 'Jingdezhen branch';

Query Phoenix data using JDBC program

Requirements:

  • Write Java code to query order details

1. Import dependency

<properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>

<dependencies>
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>3.8.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.phoenix</groupId>
        <artifactId>phoenix-core</artifactId>
        <version>4.14.0-HBase-1.1</version>
    </dependency>
</dependencies>

2. Write JDBC program

URL: jdbc:phoenix:node1:2181

3. Create a new module. Be careful not to use itcast_shop_parent inheritance (otherwise, it depends on CDH)

Reference code:

Class.forName("org.apache.phoenix.jdbc.PhoenixDriver");

Connection connection = DriverManager.getConnection("jdbc:phoenix:node1:2181", "", "");
Statement statement = connection.createStatement();
ResultSet resultSet = statement.executeQuery("select * from \"dwd_order_detail\" limit 10 ");

while(resultSet.next()) {
    String rowid = resultSet.getString("rowid");
    String ogId = resultSet.getString("ogId");
    String orderId = resultSet.getString("orderId");
    String goodsId = resultSet.getString("goodsId");
    String goodsNum = resultSet.getString("goodsNum");
    String goodsPrice = resultSet.getString("goodsPrice");
    String goodsName = resultSet.getString("goodsName");
    String shopId = resultSet.getString("shopId");
    String goodsThirdCatId = resultSet.getString("goodsThirdCatId");
    String goodsThirdCatName = resultSet.getString("goodsThirdCatName");
    String goodsSecondCatId = resultSet.getString("goodsSecondCatId");
    String goodsSecondCatName = resultSet.getString("goodsSecondCatName");
    String goodsFirstCatId = resultSet.getString("goodsFirstCatId");
    String goodsFirstCatName = resultSet.getString("goodsFirstCatName");
    String areaId = resultSet.getString("areaId");
    String shopName = resultSet.getString("shopName");
    String shopCompany = resultSet.getString("shopCompany");
    String cityId = resultSet.getString("cityId");
    String cityName = resultSet.getString("cityName");
    String regionId = resultSet.getString("regionId");
    String regionName = resultSet.getString("regionName");

    System.out.print(rowid);
    System.out.print(ogId);
    System.out.print(orderId);
    System.out.print(goodsId);
    System.out.print(goodsNum);
    System.out.print(goodsPrice);
    System.out.print(goodsName);
    System.out.print(shopId);
    System.out.print(goodsThirdCatId);
    System.out.print(goodsThirdCatName);
    System.out.print(goodsSecondCatId);
    System.out.print(goodsSecondCatName);
    System.out.print(goodsFirstCatId);
    System.out.print(goodsFirstCatName);
    System.out.print(areaId);
    System.out.print(shopName);
    System.out.print(shopCompany);
    System.out.print(cityId);
    System.out.print(cityName);
    System.out.print(regionId);
    System.out.print(regionName);
    System.out.println();
}

resultSet.close();
statement.close();
connection.close();
}

Topics: HBase