sqoop data export and import

Posted by wolfrock on Wed, 26 Feb 2020 07:30:01 +0100

1. Introduction to sqoop:
Sqoop is an open source tool, mainly used in Hadoop(Hive) and traditional databases (mysql, postgresql )
Data can be transferred from one relational database (such as mysql, Oracle, Postgres, etc.) to another
Data can be imported into HDFS of Hadoop or into relational database.
The Sqoop project started in 2009. It first existed as a third-party module of Hadoop. Later, in order to make
Users can deploy quickly. In order to make developers develop iteratively more quickly, Sqoop becomes an Apache independently
Project.
The latest version of Sqoop2 is 1.99.7. Note that 2 is not compatible with 1 and the feature is incomplete, it is not intended to be used for
Production deployment.
2. The principle of scoop:
Translate the import or export command into mapreduce program.
In the translated mapreduce, input format and output format are mainly customized.
3. Import data
In Sqoop, the concept of "import" refers to: from non big data cluster (RDBMS) to big data cluster (HDFS, HIVE,
HBASE), called import, uses the import keyword.
1 mysql----hdfs:

#Syntax:
sqoop import --connect JDBCURL --table Table name --username accounts --password Password --target-dir Export to HDFS target --fields-terminated-by '\t'  -m  mapper Number of


sqoop import --connect jdbc:mysql://HadoopNode1:3306/test --table emp --username root --password 123456 --target-dir /user/sheng/input/sqoop1/stu  --fields-terminated-by '\t'  -m 1 

2 MySQL - > incremental import of HDFS:

When mysql data table is used over time, the original data in the table grows, but you only want to import the increased records to HDFS

In the actual work, the data in the database table is constantly increasing, such as the consumer table just now, so each time you import, you only want to import the incremental part, and you don't want to re import the data in the table once (time-consuming and laborious), that is, if the data in the table adds content, you need to import it into Hadoop. If the data in the table doesn't add, you don't need to import it Is incremental import.

  • -incremental append: incremental import

  • -Check column: (you need to specify the incremental standard when importing incremental - which column is the incremental standard)

  • -Last value: (reference column must be specified during incremental import - the last value imported last time, otherwise the data in the table will be imported again)


    #Syntax:
    sqoop import --connect JDBCURL --table Table name --username accounts --password Password --target-dir Export to HDFS target --fields-terminated-by '\t'  -m  mapper Number of   --incremental append  --check-column  What field is used to identify the addition  --last-value  Maximum records
    
    #Example:
    sqoop import --connect    jdbc:mysql://HoodpNode3:3306/test     --table  student                 --username root --password 123456 --target-dir /user/sheng/input/sqoop1/stu --fields-terminated-by '\t'  -m 1   --incremental append  --check-column  empno  --last-value  7934



3 mysql ----> Hive

 grammar:
sqoop import --connect JDBCURL --table Table name --username accounts --password Password --hive-import --create-hive-table  --hive-table Database name.Table name --fields-terminated-by  '\t'  -m  1
 
 
 sqoop import --connect jdbc:mysql://HadoopNode1:3306/test --table emp --username root --password 123456 --hive-import --create-hive-table  --hive-table default.emp --fields-terminated-by  '\t'  -m 1

4 HDFS ----> mysql:

#Syntax:
sqoop  export  --connect JDBCURL --table Table name --username accounts --password Password  --table  Table name    -export-dir  HDFS Path --input-fields-terminated-by  '\t' 


#Example:
sqoop export --connect jdbc:mysql://HadoopNode1:3306/test --table hdfs_mysql --username  root --password 123456 --export-dir  /user/sheng/input/sqoop1/stu/part-m-00000
  --fields-terminated-by '\t'  -m 1

5 HIVE ----> mysql:

#Syntax:
sqoop  export  --connect JDBCURL --table Table name --username accounts --password Password  --table  Table name    -export-dir  hive Path of data warehouse and table --input-fields-terminated-by  '\t' 

#Example:

sqoop  export  --connect jdbc:mysql://HadoopNode1:3306/test  --username root --password 123456 --table  hive_mysql    -export-dir  /user/hive/warehouse/emp/emp.txt --input-fields-terminated-by  '\t'

--table hive_mysql 
--direct 
128 original articles published, 53 praised, 10000 visitors+
Private letter follow

Topics: MySQL hive JDBC Hadoop