Learning notes for Hbase (2)
In the learning notes (1) of Hbase, it mainly briefly states the theoretical knowledge related to Hbase. In the learning notes (2) of Hbase, we mainly learn the simple addition, deletion, modification and query commands of Hbase.
Basic shell command operation of Hbase
1. Enter the Hbase shell command line window
Connect the virtual machine, enter the shell command mode, and execute in the bin directory:
./hbse shell
Or, directly enter the path of bin and use it with the shell command. For example, the path stored in my hbase bin directory is / usr / HDP / current / hbase client / bin, then you can use the following command:
/usr/hdp/current/hbase-client/bin/hbase shell
Execute after entering the command. The behavior of the cursor:
hbase(main):001:0>
2. View all tables
Enter the list command to view all the current tables (there are two tables shown in my side, which are the previous tables)
hbase(main):001:0> list TABLE futuresFunctionBar selfstock 2 row(s) Took 1.0300 seconds => ["futuresFunctionBar","selfstock"]
3. Create table
To create a new table, you must first give it a name and define a schema for it. The schema of a table contains the list of table attributes and column families.
eg: if we want to create a new table named test, which contains a column named data, and the table and column family properties are the default values, we can use the following command: (Note: do not distinguish between single quotation marks and double quotation marks)
hbase(main):002:0> create 'test','data' Created table test Took 5.5800 seconds => Hbase::Table - test
As above, "test" is the table name data "is the name of the column family. You can define one or more column families when creating a table
After creation, you can enter list to check whether the table is created successfully:
hbase(main):003:0> list TABLE futuresFunctionBar selfstock test 3 row(s) Took 0.0289 seconds => ["futuresFunctionBar","selfstock","test"]
4. Insert data
The put command can be used to add data or update data:
The put command specification is:
put ' table ' , ' row key ' , ' column_family : column ' , ' value '
hbase(main):004:0> put 'test','row1',"data:1","value1" Took 0.2251 seconds hbase(main):005:0> put 'test','row2',"data:2","value2" Took 0.0130 seconds
Of course, there are more similar put methods. You can use the help 'put' command to view:
hbase(main):010:0> help "put" Put a cell 'value' at specified table/row/column and optionally timestamp coordinates. To put a cell value into table 'ns1:t1' or 't1' at row 'r1' under column 'c1' marked with the time 'ts1', do: hbase> put 'ns1:t1', 'r1', 'c1', 'value' hbase> put 't1', 'r1', 'c1', 'value' hbase> put 't1', 'r1', 'c1', 'value', ts1 hbase> put 't1', 'r1', 'c1', 'value', {ATTRIBUTES=>{'mykey'=>'myvalue'}} hbase> put 't1', 'r1', 'c1', 'value', ts1, {ATTRIBUTES=>{'mykey'=>'myvalue'}} hbase> put 't1', 'r1', 'c1', 'value', ts1, {VISIBILITY=>'PRIVATE|SECRET'} The same commands also can be run on a table reference. Suppose you had a reference t to table 't1', the corresponding command would be: hbase> t.put 'r1', 'c1', 'value', ts1, {ATTRIBUTES=>{'mykey'=>'myvalue'}}
5. Read data
When we want to get the data of the second line, we can use the get command to get the data.
hbase(main):006:0> get 'test','row2' COLUMN CELL data:2 timestamp=1645682739375, value=value2 Took 0.0592 seconds
Similarly, there are other similar get methods. Use the help 'get' command to view:
hbase(main):011:0> help "get" Get row or cell contents; pass table name, row, and optionally a dictionary of column(s), timestamp, timerange and versions. Examples: hbase> get 'ns1:t1', 'r1' hbase> get 't1', 'r1' hbase> get 't1', 'r1', {TIMERANGE => [ts1, ts2]} hbase> get 't1', 'r1', {COLUMN => 'c1'} hbase> get 't1', 'r1', {COLUMN => ['c1', 'c2', 'c3']} hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1} hbase> get 't1', 'r1', {COLUMN => 'c1', TIMERANGE => [ts1, ts2], VERSIONS => 4} hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4} hbase> get 't1', 'r1', {FILTER => "ValueFilter(=, 'binary:abc')"} hbase> get 't1', 'r1', 'c1' hbase> get 't1', 'r1', 'c1', 'c2' hbase> get 't1', 'r1', ['c1', 'c2'] hbase> get 't1', 'r1', {COLUMN => 'c1', ATTRIBUTES => {'mykey'=>'myvalue'}} hbase> get 't1', 'r1', {COLUMN => 'c1', AUTHORIZATIONS => ['PRIVATE','SECRET']} hbase> get 't1', 'r1', {CONSISTENCY => 'TIMELINE'} hbase> get 't1', 'r1', {CONSISTENCY => 'TIMELINE', REGION_REPLICA_ID => 1}
6. View table data
When you need to view all data of a known table. Use the scan command:
hbase(main):007:0> scan 'test' Row COLUMN+CELL row1 column=data:1, timestamp=1645682714578, value=value1 row2 column=data:2, timestamp=1645682739375, value=value2 2 row(s) Took 0.2275 seconds
The above is to get the data of the whole table. If you want to get a column family, such as "column"_ family_ 1 ", the command specification is:
scan ' table ' , { COLUMN => ' column_family_1 ' }
The above specifications are equivalent to:
scan ' table ' , { COLUMN => [ ' column_family_1 ' ] }
If you need to take the data under the multi column family (such as "column_family_1", "column_family_2"), the command specification is:
scan ' table ' , { COLUMN => [ ' column_family_1 ' ] , [ ' column_family_2 ' ] }
To be more detailed, you may need to get 'column_ family_ 'column under 1' column family_ Data in column 1 ':
scan ' table ' , { COLUMN => [ ' column_family_1 : column_1 ' ] }
Similarly, multi column data can be obtained as follows:
scan ' table ' , { COLUMN => [ ' column_family_1 : column_1 ' , ' column_family_2 : column_3' ] }
To get the case where the row key is greater than or equal to a key: (STARTROW)
scan ' table ' , { COLUMN => [ ' column_family_1 : column_1 ' , ' column_family_2 : column_3' ] , STARTROW => ' key-2 ' }
Similarly, if the row key is less than a certain key: (STOPROW)
scan ' table ' , { COLUMN => [ ' column_family_1 : column_1 ' , ' column_family_2 : column_3' ] , STOPROW => ' key_2 ' }
Get the between two keys, for example, greater than or equal to key_2. Less than key_5 :
scan ' table ' , { COLUMN => [ ' column_family_1 : column_1 ' , ' column_family_2 : column_3' ] , STARTROW => ' key_2 ' , STOPROW => ' key_5 ' }
It should be noted that the keyword STARTROW is slightly different from STOPROW. The row key after STARTROW is included, while the row key after STOPROW is not included, which is equivalent to the relationship of "close left and open right".
When scanning the data of a table with the scan command, we often limit the number of row key s output:
scan ' table ' , { LIMIT => 2 }
Although it is "limit = > 2" in the above specification, the returned results are not necessarily 2. It may be greater than 2. Because the row key is the only primary key, the number of this limit is for the primary key rowkey. Therefore, limit = > 2 refers to all data of no more than two rowkeys.
If you want to get two rows of data in reverse order:
scan ' table ' , { LIMIT => 2 ,REVERSED => true }
By default, reversed = > false means data is read in positive order. When reversed = > true is set, data is read in reverse order.
7. Check table details
When you need to see the details of the table, use the desc command or the describe command:
hbase(main):008:0> desc 'test' Table test_table is ENABLED test_table COLUMN FAMILIES DESCRIPTION {Name => 'data', VERSIONs =>'1', EVICT_BLOCKS_ON_CLOSE =>'false', NEW_VERSION_BEHAVICK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0',BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'fales', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'} 1 row(s) Took 0.0890 seconds
8. Delete data
When the data is not inserted properly, you can use the delete command to delete:
delete ' table ' , ' row key ' , ' column_family_1 : column '
When the entire row of data needs to be deleted, use the deleteall command:
hbase(main):009:0> deleteall 'test', 'row1' Took 0.0187 seconds
Delete this line to view the remaining data:
hbase(main):010:0> scan 'test' Row COLUMN+CELL row2 column=data:2, timestamp=1645682739375, value=value2 1 row(s) Took 0.0148 seconds
If you need to delete the data of the whole table (the table structure is still there), you can use the truncate command:
truncate ' table '
9. Delete table
When the created table needs to be deleted, for example, to delete the 'test' table, first set it to disabled, and then delete it. If you drop the table directly, an error will be reported.
You can find that deleting a table requires two steps:
- disable table
- drop table
After the table is created successfully, the default state is enable, that is, the state of "in use". Before deleting the table, you need to set the table to "closing".
- Set the table to in use: enable 'table'
- Set the table to "closing": disable 'table'
hbase(main):011:0> disable 'test' Took 3.9249 seconds hbase(main):012:0> drop 'test' Took 2.5588 seconds hbase(main):013:0> list TABLE futuresFunctionBar selfstock 2 row(s) Took 1.0300 seconds => ["futuresFunctionBar","selfstock"]
10. Number of rows in the statistical table
When you need to count the number of table data rows, use the count command:
count ' table '
11. Adding families
alter ' table ' , { NAME => ' new_column_family ' , VERSIONS => ' number ' }
12. Delete column family
alter ' table ' , { NAME => ' column_family ' , METHOD => ' delete ' }