Recently, I need to export MySQL data to HDFS, so I found sqoop2. Compared with sqoop1, sqoop2 has the advantage of directly using programs to connect to sqoop on the cluster for remote operation. The process needs to create a link first, which can also be understood as an object to be operated. For example, one link is HDFS and the other link is mysql. After having a link, you need to create a job. To create a job, you need to specify the two links to interact, set the relationship between from and to, and then execute the job.
Installation:
Installation is really a big problem. Problems are emerging one after another. It took me a whole night to finish it. Here are the pits encountered on the installation Road s.
First, I installed the latest version of 1.99.7, Download address
Official documents are available: Apache Sqoop2
1, Hadoop installation
See the content after section 5 of this blog for the specific operation of hadoop installation: https://www.cnblogs.com/bjwu/p/9863634.html
be careful ⚠️, Configure core site In the process of XML, you need to add the following two attributes:
<property> <name>hadoop.proxyuser.sqoop2.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.sqoop2.groups</name> <value>*</value> </property>
Also, in the configuration file container - executor In CFG, remember to add:
allowed.system.users=sqoop2
2, Third party jars
Third party jars, because of my project needs, I just need to import MySQL connector Java. here download , extract the jar file and execute the following command:
# Create directory for extra jars mkdir -p /var/lib/sqoop2/ # Copy all your JDBC drivers to this directory cp mysql-jdbc*.jar /var/lib/sqoop2/
3, Environmental variables
Yes bash_ Add environment variable to profile
export SQOOP_HOME=/usr/lib/sqoop export SQOOP_SERVER_EXTRA_LIB=/var/lib/sqoop2/ export PATH=$PATH:$SQOOP_HOME/bin
4, Configure server
Here's the problem. It's written on the official website:
Second configuration file called sqoop.properties contains remaining configuration properties that can affect Sqoop server. The configuration file is very well documented, so check if all configuration properties fits your environment. Default or very little tweaking should be sufficient in most common cases.
However, only the default configuration is really not good:
Open sqoop Properties, change the first line below to your own directory, and add another three lines:
The official document only says the first item above, the configuration file path of mapreduce, but then there is an authentication exception in the operation. Find the security part of the sqoop document description, and find that sqoop supports hadoop's simple and kerberos authentication mechanisms. Therefore, a simple verification is configured to eliminate this exception.
org.apache.sqoop.submission.engine.mapreduce.configuration.directory=$HADOOP_HOME/etc/hadoop org.apache.sqoop.security.authentication.type=SIMPLE org.apache.sqoop.security.authentication.handler=org.apache.sqoop.security.authentication.SimpleAuthenticationHandler org.apache.sqoop.security.authentication.anonymous=true
Of course, in this process, you may encounter several problems, such as
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/configuration/Configuration
You can try the following methods:
cp -R $HADOOP_HOME/share/hadoop/common/* $SQOOP_HOME/server/lib/ cp -R $HADOOP_HOME/share/hadoop/common/lib/* $SQOOP_HOME/server/lib/ cp -R $HADOOP_HOME/share/hadoop/hdfs/* $SQOOP_HOME/server/lib/ cp -R $HADOOP_HOME/share/hadoop/hdfs/lib/* $SQOOP_HOME/server/lib/ cp -R $HADOOP_HOME/share/hadoop/mapreduce/* $SQOOP_HOME/server/lib/ cp -R $HADOOP_HOME/share/hadoop/mapreduce/lib/* $SQOOP_HOME/server/lib/ cp -R $HADOOP_HOME/share/hadoop/yarn/* $SQOOP_HOME/server/lib/ cp -R $HADOOP_HOME/share/hadoop/yarn/lib/* $SQOOP_HOME/server/lib/
5, Start
After configuration, configuration initialization is required before the first startup, that is:
$ sqoop2-tool upgrade Setting conf dir: /usr/lib/sqoop/bin/../conf Sqoop home directory: /usr/lib/sqoop Sqoop tool executor: Version: 1.99.7 Revision: 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb Compiled on Tue Jul 19 16:08:27 PDT 2016 by abefine Running tool: class org.apache.sqoop.tools.tool.UpgradeTool 2019-01-10 22:31:06,509 INFO [main] core.PropertiesConfigurationProvider (PropertiesConfigurationProvider.java:initialize(99)) - Starting config file poller thread Tool class org.apache.sqoop.tools.tool.UpgradeTool has finished correctly.
It smells good! After that, you can check whether everything is configured correctly:
$ sqoop2-tool verify Setting conf dir: /usr/lib/sqoop/bin/../conf Sqoop home directory: /usr/lib/sqoop Sqoop tool executor: Version: 1.99.7 Revision: 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb Compiled on Tue Jul 19 16:08:27 PDT 2016 by abefine Running tool: class org.apache.sqoop.tools.tool.VerifyTool 2019-01-10 22:31:42,317 INFO [main] core.SqoopServer (SqoopServer.java:initialize(55)) - Initializing Sqoop server. 2019-01-10 22:31:42,326 INFO [main] core.PropertiesConfigurationProvider (PropertiesConfigurationProvider.java:initialize(99)) - Starting config file poller thread Verification was successful. Tool class org.apache.sqoop.tools.tool.VerifyTool has finished correctly.
Start server:
$ sqoop2-server start Setting conf dir: /usr/lib/sqoop/bin/../conf Sqoop home directory: /usr/lib/sqoop Starting the Sqoop2 server... 2019-01-10 22:37:22,806 INFO [main] core.SqoopServer (SqoopServer.java:initialize(55)) - Initializing Sqoop server. 2019-01-10 22:37:22,816 INFO [main] core.PropertiesConfigurationProvider (PropertiesConfigurationProvider.java:initialize(99)) - Starting config file poller thread Sqoop2 server started.
6, Change your mind
Well, having said so much, I'd better change to sqoop1, because sqoop2 has no bug s. It's a little complicated and the learning cost is a little high.
There are many online tutorials for the installation of sqoop1. Let's just say that when running the sqoop1 program, there are many dependencies on importing maven:
Anyway, I put so many libraries because of various exceptions 😢:
<dependency> <groupId>org.apache.sqoop</groupId> <artifactId>sqoop</artifactId> <version>1.4.7</version> <scope>system</scope> <systemPath>${basedir}/lib/sqoop-1.4.7.jar</systemPath> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.6.5</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-core</artifactId> <version>2.6.5</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-jobclient</artifactId> <version>2.6.5</version> </dependency> <dependency> <groupId>org.apache.avro</groupId> <artifactId>avro</artifactId> <version>1.8.2</version> </dependency>
Reference: