Learning notes for Hbase

Posted by HoangLong on Thu, 24 Feb 2022 11:50:02 +0100

Learning notes for Hbase (2)

In the learning notes (1) of Hbase, it mainly briefly states the theoretical knowledge related to Hbase. In the learning notes (2) of Hbase, we mainly learn the simple addition, deletion, modification and query commands of Hbase.

Basic shell command operation of Hbase

1. Enter the Hbase shell command line window

Connect the virtual machine, enter the shell command mode, and execute in the bin directory:

./hbse shell

Or, directly enter the path of bin and use it with the shell command. For example, the path stored in my hbase bin directory is / usr / HDP / current / hbase client / bin, then you can use the following command:

/usr/hdp/current/hbase-client/bin/hbase shell

Execute after entering the command. The behavior of the cursor:

hbase(main):001:0> 

2. View all tables

Enter the list command to view all the current tables (there are two tables shown in my side, which are the previous tables)

hbase(main):001:0> list
TABLE
futuresFunctionBar
selfstock
2 row(s)
Took 1.0300 seconds
=> ["futuresFunctionBar","selfstock"]

3. Create table

To create a new table, you must first give it a name and define a schema for it. The schema of a table contains the list of table attributes and column families.

eg: if we want to create a new table named test, which contains a column named data, and the table and column family properties are the default values, we can use the following command: (Note: do not distinguish between single quotation marks and double quotation marks)

hbase(main):002:0> create 'test','data'
Created table test
Took 5.5800 seconds
=> Hbase::Table - test

As above, "test" is the table name data "is the name of the column family. You can define one or more column families when creating a table

After creation, you can enter list to check whether the table is created successfully:

hbase(main):003:0> list
TABLE
futuresFunctionBar
selfstock
test
3 row(s)
Took 0.0289 seconds
=> ["futuresFunctionBar","selfstock","test"]

4. Insert data

The put command can be used to add data or update data:

The put command specification is:

put ' table ' , ' row key ' , ' column_family : column ' , ' value '

hbase(main):004:0> put 'test','row1',"data:1","value1"
Took 0.2251 seconds
hbase(main):005:0> put 'test','row2',"data:2","value2"
Took 0.0130 seconds

Of course, there are more similar put methods. You can use the help 'put' command to view:

hbase(main):010:0> help "put"
Put a cell 'value' at specified table/row/column and optionally
timestamp coordinates.  To put a cell value into table 'ns1:t1' or 't1'
at row 'r1' under column 'c1' marked with the time 'ts1', do:

  hbase> put 'ns1:t1', 'r1', 'c1', 'value'
  hbase> put 't1', 'r1', 'c1', 'value'
  hbase> put 't1', 'r1', 'c1', 'value', ts1
  hbase> put 't1', 'r1', 'c1', 'value', {ATTRIBUTES=>{'mykey'=>'myvalue'}}
  hbase> put 't1', 'r1', 'c1', 'value', ts1, {ATTRIBUTES=>{'mykey'=>'myvalue'}}
  hbase> put 't1', 'r1', 'c1', 'value', ts1, {VISIBILITY=>'PRIVATE|SECRET'}

The same commands also can be run on a table reference. Suppose you had a reference
t to table 't1', the corresponding command would be:

  hbase> t.put 'r1', 'c1', 'value', ts1, {ATTRIBUTES=>{'mykey'=>'myvalue'}}

5. Read data

When we want to get the data of the second line, we can use the get command to get the data.

hbase(main):006:0> get 'test','row2'
COLUMN           CELL
 data:2           timestamp=1645682739375, value=value2
Took 0.0592 seconds 

Similarly, there are other similar get methods. Use the help 'get' command to view:

hbase(main):011:0> help "get"
Get row or cell contents; pass table name, row, and optionally
a dictionary of column(s), timestamp, timerange and versions. Examples:

  hbase> get 'ns1:t1', 'r1'
  hbase> get 't1', 'r1'
  hbase> get 't1', 'r1', {TIMERANGE => [ts1, ts2]}
  hbase> get 't1', 'r1', {COLUMN => 'c1'}
  hbase> get 't1', 'r1', {COLUMN => ['c1', 'c2', 'c3']}
  hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
  hbase> get 't1', 'r1', {COLUMN => 'c1', TIMERANGE => [ts1, ts2], VERSIONS => 4}
  hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4}
  hbase> get 't1', 'r1', {FILTER => "ValueFilter(=, 'binary:abc')"}
  hbase> get 't1', 'r1', 'c1'
  hbase> get 't1', 'r1', 'c1', 'c2'
  hbase> get 't1', 'r1', ['c1', 'c2']
  hbase> get 't1', 'r1', {COLUMN => 'c1', ATTRIBUTES => {'mykey'=>'myvalue'}}
  hbase> get 't1', 'r1', {COLUMN => 'c1', AUTHORIZATIONS => ['PRIVATE','SECRET']}
  hbase> get 't1', 'r1', {CONSISTENCY => 'TIMELINE'}
  hbase> get 't1', 'r1', {CONSISTENCY => 'TIMELINE', REGION_REPLICA_ID => 1}

6. View table data

When you need to view all data of a known table. Use the scan command:

hbase(main):007:0> scan 'test'
Row            COLUMN+CELL
 row1          column=data:1, timestamp=1645682714578, value=value1
 row2          column=data:2, timestamp=1645682739375, value=value2
2 row(s)
Took 0.2275 seconds

The above is to get the data of the whole table. If you want to get a column family, such as "column"_ family_ 1 ", the command specification is:

scan ' table ' , { COLUMN => ' column_family_1 ' }

The above specifications are equivalent to:

scan ' table ' , { COLUMN => [ ' column_family_1 ' ] }

If you need to take the data under the multi column family (such as "column_family_1", "column_family_2"), the command specification is:

scan ' table ' , { COLUMN => [ ' column_family_1 ' ] , [ ' column_family_2 ' ] }

To be more detailed, you may need to get 'column_ family_ 'column under 1' column family_ Data in column 1 ':

scan ' table ' , { COLUMN => [ ' column_family_1 : column_1 ' ] }

Similarly, multi column data can be obtained as follows:

scan ' table ' , { COLUMN => [ ' column_family_1 : column_1 ' , ' column_family_2 : column_3' ] }

To get the case where the row key is greater than or equal to a key: (STARTROW)

scan ' table ' , { COLUMN => [ ' column_family_1 : column_1 ' , ' column_family_2 : column_3' ] , STARTROW => ' key-2 ' }

Similarly, if the row key is less than a certain key: (STOPROW)

scan ' table ' , { COLUMN => [ ' column_family_1 : column_1 ' , ' column_family_2 : column_3' ] , STOPROW => ' key_2 ' }

Get the between two keys, for example, greater than or equal to key_2. Less than key_5 :

scan ' table ' , { COLUMN => [ ' column_family_1 : column_1 ' , ' column_family_2 : column_3' ] , STARTROW => ' key_2 ' , STOPROW => ' key_5 ' }

It should be noted that the keyword STARTROW is slightly different from STOPROW. The row key after STARTROW is included, while the row key after STOPROW is not included, which is equivalent to the relationship of "close left and open right".

When scanning the data of a table with the scan command, we often limit the number of row key s output:

scan ' table ' , { LIMIT => 2 }

Although it is "limit = > 2" in the above specification, the returned results are not necessarily 2. It may be greater than 2. Because the row key is the only primary key, the number of this limit is for the primary key rowkey. Therefore, limit = > 2 refers to all data of no more than two rowkeys.

If you want to get two rows of data in reverse order:

scan ' table ' , { LIMIT => 2 ,REVERSED => true }

By default, reversed = > false means data is read in positive order. When reversed = > true is set, data is read in reverse order.

7. Check table details

When you need to see the details of the table, use the desc command or the describe command:

hbase(main):008:0> desc 'test'
Table test_table is ENABLED                                                
test_table                                                          
COLUMN FAMILIES DESCRIPTION     
{Name => 'data', VERSIONs =>'1', EVICT_BLOCKS_ON_CLOSE =>'false', NEW_VERSION_BEHAVICK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0',BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'fales', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'} 
1 row(s)
Took 0.0890 seconds

8. Delete data

When the data is not inserted properly, you can use the delete command to delete:

delete ' table ' , ' row key ' , ' column_family_1 : column '

When the entire row of data needs to be deleted, use the deleteall command:

hbase(main):009:0> deleteall 'test', 'row1'
Took 0.0187 seconds

Delete this line to view the remaining data:

hbase(main):010:0> scan 'test'
Row            COLUMN+CELL
 row2          column=data:2, timestamp=1645682739375, value=value2
1 row(s)
Took 0.0148 seconds

If you need to delete the data of the whole table (the table structure is still there), you can use the truncate command:

truncate ' table '

9. Delete table

When the created table needs to be deleted, for example, to delete the 'test' table, first set it to disabled, and then delete it. If you drop the table directly, an error will be reported.

You can find that deleting a table requires two steps:

  • disable table
  • drop table

After the table is created successfully, the default state is enable, that is, the state of "in use". Before deleting the table, you need to set the table to "closing".

  • Set the table to in use: enable 'table'
  • Set the table to "closing": disable 'table'
hbase(main):011:0> disable 'test'
Took 3.9249 seconds
hbase(main):012:0> drop 'test'
Took 2.5588 seconds
hbase(main):013:0> list
TABLE
futuresFunctionBar
selfstock
2 row(s)
Took 1.0300 seconds
=> ["futuresFunctionBar","selfstock"]

10. Number of rows in the statistical table

When you need to count the number of table data rows, use the count command:

count ' table '

11. Adding families

alter ' table ' , { NAME => ' new_column_family ' , VERSIONS => ' number ' }

12. Delete column family

alter ' table ' , { NAME => ' column_family ' , METHOD => ' delete ' }

Topics: Big Data HBase