Installation and configuration of sqoop

Posted by canadian_angel on Sun, 16 Jan 2022 17:47:04 +0100

Recently, I need to export MySQL data to HDFS, so I found sqoop2. Compared with sqoop1, sqoop2 has the advantage of directly using programs to connect to sqoop on the cluster for remote operation. The process needs to create a link first, which can also be understood as an object to be operated. For example, one link is HDFS and the other link is mysql. After having a link, you need to create a job. To create a job, you need to specify the two links to interact, set the relationship between from and to, and then execute the job.

Installation:

Installation is really a big problem. Problems are emerging one after another. It took me a whole night to finish it. Here are the pits encountered on the installation Road s.

First, I installed the latest version of 1.99.7, Download address

Official documents are available: Apache Sqoop2

1, Hadoop installation

See the content after section 5 of this blog for the specific operation of hadoop installation: https://www.cnblogs.com/bjwu/p/9863634.html

be careful ⚠️, Configure core site In the process of XML, you need to add the following two attributes:

<property>
  <name>hadoop.proxyuser.sqoop2.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.sqoop2.groups</name>
  <value>*</value>
</property>

Also, in the configuration file container - executor In CFG, remember to add:

allowed.system.users=sqoop2

2, Third party jars

Third party jars, because of my project needs, I just need to import MySQL connector Java. here download , extract the jar file and execute the following command:

# Create directory for extra jars
mkdir -p /var/lib/sqoop2/

# Copy all your JDBC drivers to this directory
cp mysql-jdbc*.jar /var/lib/sqoop2/

3, Environmental variables

Yes bash_ Add environment variable to profile

export SQOOP_HOME=/usr/lib/sqoop 
export SQOOP_SERVER_EXTRA_LIB=/var/lib/sqoop2/
export PATH=$PATH:$SQOOP_HOME/bin

4, Configure server

Here's the problem. It's written on the official website:

Second configuration file called sqoop.properties contains remaining configuration properties that can affect Sqoop server. The configuration file is very well documented, so check if all configuration properties fits your environment. Default or very little tweaking should be sufficient in most common cases.

However, only the default configuration is really not good:

Open sqoop Properties, change the first line below to your own directory, and add another three lines:

The official document only says the first item above, the configuration file path of mapreduce, but then there is an authentication exception in the operation. Find the security part of the sqoop document description, and find that sqoop supports hadoop's simple and kerberos authentication mechanisms. Therefore, a simple verification is configured to eliminate this exception.

org.apache.sqoop.submission.engine.mapreduce.configuration.directory=$HADOOP_HOME/etc/hadoop

org.apache.sqoop.security.authentication.type=SIMPLE  
org.apache.sqoop.security.authentication.handler=org.apache.sqoop.security.authentication.SimpleAuthenticationHandler  
org.apache.sqoop.security.authentication.anonymous=true  

Of course, in this process, you may encounter several problems, such as

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/configuration/Configuration

You can try the following methods:

cp -R $HADOOP_HOME/share/hadoop/common/* $SQOOP_HOME/server/lib/
cp -R $HADOOP_HOME/share/hadoop/common/lib/* $SQOOP_HOME/server/lib/
cp -R $HADOOP_HOME/share/hadoop/hdfs/* $SQOOP_HOME/server/lib/
cp -R $HADOOP_HOME/share/hadoop/hdfs/lib/* $SQOOP_HOME/server/lib/
cp -R $HADOOP_HOME/share/hadoop/mapreduce/* $SQOOP_HOME/server/lib/
cp -R $HADOOP_HOME/share/hadoop/mapreduce/lib/* $SQOOP_HOME/server/lib/
cp -R $HADOOP_HOME/share/hadoop/yarn/* $SQOOP_HOME/server/lib/
cp -R $HADOOP_HOME/share/hadoop/yarn/lib/* $SQOOP_HOME/server/lib/

5, Start

After configuration, configuration initialization is required before the first startup, that is:

$ sqoop2-tool upgrade
Setting conf dir: /usr/lib/sqoop/bin/../conf
Sqoop home directory: /usr/lib/sqoop
Sqoop tool executor:
	Version: 1.99.7
	Revision: 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb
	Compiled on Tue Jul 19 16:08:27 PDT 2016 by abefine
Running tool: class org.apache.sqoop.tools.tool.UpgradeTool
2019-01-10 22:31:06,509 INFO  [main] core.PropertiesConfigurationProvider (PropertiesConfigurationProvider.java:initialize(99)) - Starting config file poller thread
Tool class org.apache.sqoop.tools.tool.UpgradeTool has finished correctly.

It smells good! After that, you can check whether everything is configured correctly:

$ sqoop2-tool verify 
Setting conf dir: /usr/lib/sqoop/bin/../conf
Sqoop home directory: /usr/lib/sqoop
Sqoop tool executor:
	Version: 1.99.7
	Revision: 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb
	Compiled on Tue Jul 19 16:08:27 PDT 2016 by abefine
Running tool: class org.apache.sqoop.tools.tool.VerifyTool
2019-01-10 22:31:42,317 INFO  [main] core.SqoopServer (SqoopServer.java:initialize(55)) - Initializing Sqoop server.
2019-01-10 22:31:42,326 INFO  [main] core.PropertiesConfigurationProvider (PropertiesConfigurationProvider.java:initialize(99)) - Starting config file poller thread
Verification was successful.
Tool class org.apache.sqoop.tools.tool.VerifyTool has finished correctly.

Start server:

$ sqoop2-server start  
Setting conf dir: /usr/lib/sqoop/bin/../conf
Sqoop home directory: /usr/lib/sqoop
Starting the Sqoop2 server...
2019-01-10 22:37:22,806 INFO  [main] core.SqoopServer (SqoopServer.java:initialize(55)) - Initializing Sqoop server.
2019-01-10 22:37:22,816 INFO  [main] core.PropertiesConfigurationProvider (PropertiesConfigurationProvider.java:initialize(99)) - Starting config file poller thread
Sqoop2 server started.

6, Change your mind

Well, having said so much, I'd better change to sqoop1, because sqoop2 has no bug s. It's a little complicated and the learning cost is a little high.

There are many online tutorials for the installation of sqoop1. Let's just say that when running the sqoop1 program, there are many dependencies on importing maven:

Anyway, I put so many libraries because of various exceptions 😢:

<dependency>
	<groupId>org.apache.sqoop</groupId>
	<artifactId>sqoop</artifactId>
	<version>1.4.7</version>
	<scope>system</scope>
	<systemPath>${basedir}/lib/sqoop-1.4.7.jar</systemPath>
</dependency>
<dependency>
	<groupId>org.apache.hadoop</groupId>
	<artifactId>hadoop-common</artifactId>
	<version>2.6.5</version>
</dependency>
<dependency>
	<groupId>org.apache.hadoop</groupId>
	<artifactId>hadoop-mapreduce-client-core</artifactId>
	<version>2.6.5</version>
</dependency>
<dependency>
	<groupId>org.apache.hadoop</groupId>
	<artifactId>hadoop-mapreduce-client-jobclient</artifactId>
	<version>2.6.5</version>
</dependency>
<dependency>
	<groupId>org.apache.avro</groupId>
	<artifactId>avro</artifactId>
	<version>1.8.2</version>
</dependency>

Reference:

  1. https://stackoverflow.com/questions/41405072/sqoop-integration-with-hadoop-throw-classnotfoundexception
  2. https://sqoop.apache.org/docs/1.99.7/admin/Installation.html
  3. http://brianoneill.blogspot.com/2014/10/sqoop-1993-w-hadoop-2-installation.html
  4. https://www.yiibai.com/sqoop/sqoop_installation.html

Topics: Big Data