1. Importtsv function description
Load text data in the format of tsv (or csv, each field in each row of data is separated by a separator) into the HBase table.
1) . load and import in Put mode
2) . bulk load and import are adopted
Use the following command to view the instructions for the official HBase built-in tool class:
HADOOP_HOME=/export/servers/hadoop HBASE_HOME=/export/servers/hbase HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase mapredcp`:${HBASE_HOME}/conf ${HADOOP_HOME}/bin/yarn jar ${HBASE_HOME}/lib/hbase-server-1.2.0- cdh5.14.0.jar
Executing the above command prompts the following information:
An example program must be given as the first argument. Valid program names are: CellCounter: Count cells in HBase table. WALPlayer: Replay WAL files. completebulkload: Complete a bulk data load. copytable: Export a table from local cluster to peer cluster. export: Write table data to HDFS. exportsnapshot: Export the specific snapshot to a given FileSystem. import: Import data written by Export. importtsv: Import data in TSV format. rowcounter: Count rows in HBase table. verifyrep: Compare the data from tables in two different clusters.
Translation:
A sample program must be given as the first parameter.
Valid program names are:
Cellcounter: the number of cell s in the HBase table.
Wallayer: replay the WAL file.
completebulkload: complete batch data loading.
copytable: export tables from the local cluster to the peer cluster.
export: writes table data to HDFS.
exportsnapshot: exports a specific snapshot to a given file system.
Import: import data written by Export.
importtsv: import data in TSV format.
Rowcounter: the number of rows in the HBase table.
verifyrep: compares data from tables in two different clusters.
importtsv is to import the data of text files (such as CSV, TSV and other formats) into the HBase table tool class
The description is as follows:
Usage: importtsv -Dimporttsv.columns=a,b,c <tablename> <inputdir> The column names of the TSV data must be specified using the - Dimporttsv.columns option. This option takes the form of comma-separated column names, where each column name is either a simple column family, or a columnfamily:qualifier. The special column name HBASE_ROW_KEY is used to designate that this column should be used as the row key for each imported record. To instead generate HFiles of data to prepare for a bulk data load, pass the option: -Dimporttsv.bulk.output=/path/for/output '-Dimporttsv.separator=|' - eg separate on pipes instead of tabs For performance consider the following options: -Dmapreduce.map.speculative=false -Dmapreduce.reduce.speculative=false
Translation:
Usage: importtsv -Dimporttsv. Column names for TSV data such as column = a, B, C < Table > < inputdir > must use - specify dimporttsv Columns selection. This option takes the form of comma separated column names, where each column name can be a simple column family or a column family: qualifier. Special column name HBASE_ROW_KEY specifies that the column should be used as the row key for each imported record. To prepare batch data loading instead of HFiles that generate data, select:
-Dimporttsv.bulk.output=/path/for/output '-Dimporttsv.separator=|' - eg separate on pipes instead of tabs For performance consider the following options: -Dmapreduce.map.speculative=false -Dmapreduce.reduce.speculative=false
2. Direct import Put mode
HADOOP_HOME=/export/servers/hadoop HBASE_HOME=/export/servers/hbase HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase mapredcp`:${HBASE_HOME}/conf ${HADOOP_HOME}/bin/yarn jar ${HBASE_HOME}/lib/hbase-server-1.2.0- cdh5.14.0.jar \ importtsv \ - Dimporttsv.columns=HBASE_ROW_KEY,detail:log_id,detail:remote_ip,detail:s ite_global_ticket,detail:site_global_session,detail:global_user_id,detai l:cookie_text,detail:user_agent,detail:ref_url,detail:loc_url,detail:log _time \ tbl_logs \ /user/hive/warehouse/tags_dat.db/tbl_logs
The above command essentially runs a MapReduce application to convert and encapsulate each line of data in the text file into Put
Object and insert it into the HBase table.
To review:
Big data Sqoop imports Mysql data into Hbase with Hive
3 convert to HFile file and load it into table
# 1. Generate HFILES file HADOOP_HOME=/export/servers/hadoop HBASE_HOME=/export/servers/hbase HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase mapredcp`:${HBASE_HOME}/conf ${HADOOP_HOME}/bin/yarn jar ${HBASE_HOME}/lib/hbase-server-1.2.0- cdh5.14.0.jar \ importtsv \ -Dimporttsv.bulk.output=hdfs://bigdata- cdh01.itcast.cn:8020/datas/output_hfile/tbl_logs \ - Dimporttsv.columns=HBASE_ROW_KEY,detail:log_id,detail:remote_ip,detail: site_global_ticket,detail:site_global_session,detail:global_user_id,det ail:cookie_text,detail:user_agent,detail:ref_url,detail:loc_url,detail: log_time \ tbl_logs \ /user/hive/warehouse/tags_dat.db/tbl_logs # 2. Load the HFILE file into the table HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase mapredcp`:${HBASE_HOME}/conf ${HADOOP_HOME}/bin/yarn jar ${HBASE_HOME}/lib/hbase-server-1.2.0- cdh5.14.0.jar \ completebulkload \ hdfs://bigdata-cdh01.itcast.cn:8020/datas/output_hfile/tbl_logs \ tbl_logs
Disadvantages:
1) , ROWKEY cannot be a combined primary key
Can only be one field
2) When there are many columns in the table, write - dimporttsv Columns value is troublesome and error prone
Summary: This tool is very good for importing some small batch data, but it is not commonly used