Nanny level super detailed tutorial: Dolphin scheduler stand-alone (local) deployment and software running test

Posted by MikeyNoedel on Fri, 03 Dec 2021 14:20:53 +0100

Dolphin scheduler architecture diagram

catalogue

1, Supplement

2, Deployment process

1. Download binary tar.gz package

2. Create deployment user and grant directory operation permission

3.ssh security free configuration

4. Database initialization

5. Modify operating parameters

6. One click deployment

7. Log in to the system

If you are not familiar with dolphin scheduler, you can refer to:

1, Supplement

Install psmisc:

apt-get install psmisc

2, Deployment process

1. Download binary tar.gz package

Download the latest version of back-end installation package to the server deployment directory. For example, create / opt / Dolphin scheduler as the installation deployment directory. The download address is: https://dlcdn.apache.org/dolphinscheduler/1.3.8/apache-dolphinscheduler-1.3.8-src.tar.gz After downloading, upload the tar package to this directory

Note: the bin file should be downloaded here

decompression

# Create a deployment directory. Please do not create high permission directories such as / root and / home 
mkdir -p /opt/dolphinscheduler
cd /opt/dolphinscheduler

# decompression 
tar -zxvf apache-dolphinscheduler-1.3.8-bin.tar.gz
 
#rename
mv apache-dolphinscheduler-1.3.8-bin  dolphinscheduler-bin

The author created the dolphin scheduler folder in / usr/local, uploaded and unzipped the installation package. Then rename the package to dolphin scheduler bin.

mkdir dolphinscheduler
cd dolphinscheduler

# decompression 
tar -zxvf apache-dolphinscheduler-1.3.8-bin.tar.gz -C
 
#rename
mv apache-dolphinscheduler-1.3.8-bin  dolphinscheduler-bin

Attached:

View virtual machine ip address

ifconfig

The value followed by inet is the ip address

Then connect the virtual machine with filezilla and transfer the files

The host number is the result of the query just now. The protocol selects SFTP instead of the default FTP, and the user name is the user name of the virtual machine system instead of the terminal user of the linux terminal.

After the connection is successful, you can see the linux file directory and transfer the files by dragging.

The author encountered permission problems here:

filezilla reported an error:

command:    put "C:\Users\86136\Desktop\apache-dolphinscheduler-1.3.8-src.tar.gz" "apache-dolphinscheduler-1.3.8-src.tar.gz"
error:    /usr/local/dolphinscheduler/apache-dolphinscheduler-1.3.8-src.tar.gz: open for write: permission denied
 error:    File transfer failed

Authorize on virtual machine

sudo chmod 777 /usr/local/dolphinscheduler

Retransmit, problem solved.

In fact, directly dragging files into the virtual machine interface after installing filezilla can also realize file transfer, but as far as the author is concerned, this operation can occasionally succeed, but it often leads to system crash.

2. Create deployment user and grant directory operation permission

  • Create deployment users, and be sure to configure sudo password free. Take creating a dolphin scheduler user as an example
# root login is required to create users
useradd dolphinscheduler

# Add password
echo "dolphinscheduler" | passwd --stdin dolphinscheduler

# Configure sudo security free
sed -i '$adolphinscheduler  ALL=(ALL)  NOPASSWD: NOPASSWD: ALL' /etc/sudoers
sed -i 's/Defaults    requirett/#Defaults    requirett/g' /etc/sudoers

# Modify the directory permissions so that the deployment user has operation permissions on the dolphin scheduler bin directory
chown -R dolphinscheduler:dolphinscheduler dolphinscheduler-bin

be careful:

  • Because the task execution service implements multi tenant running jobs by sudo - u {linux user} switching different linux users, the deployment user needs sudo permission and is secret free. If beginners don't understand it, they can ignore it for the time being
  • If you find the line "Defaults requirett" in the / etc/sudoers file, please also comment it out
  • If resource upload is used, the deployment user needs to be assigned the permission to operate the local file system or HDFS or MinIO

Pit:

echo "dolphinscheduler" | passwd --stdin dolphinscheduler

This step will report an error. The passwd command in linux does not have the syntax of -- stdin

Should use

echo "dolphinscheduler:123456" | chpasswd

Modify the user dolphin scheduler password to 123456

In addition, the five instructions did not prompt success when they were run successfully for the first time.

3.ssh security free configuration

  • Switch to the deployment user and configure ssh native password free login
su dolphinscheduler

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys

Note: after normal setting, the dolphin scheduler user does not need to enter a password when executing the command ssh localhost

Pit: an error could not create directory '/ home / Dolphin scheduler /. SSH will be reported when running here

You need to create the dolphin scheduler directory under / home as root, and then create the. ssh directory in the dolphin scheduler,

Then switch back to dolphin scheduler user execution. In addition, you may need to authorize these directories.

4. Database initialization

  • After entering the database, the default database is PostgreSQL. If MySQL is selected, you need to add the MySQL connector Java driver package to the lib directory of dolphin scheduler

MySQL connector java download website: MySQL:: Download connector / J

Pay attention to switching and downloading the linux version and pay attention to the corresponding version number.

The corresponding version here is mainly the ubuntu version rather than the linux version.

To view the ubuntu version:

cat /proc/version

The results are as follows:

Version 18.04 used by the author

After downloading, unzip it in the lib directory of dolphin scheduler

However, the author encountered a problem here. The result of decompressing the downloaded file seems unconventional, and it is impossible to simply tar unpack

The solution is to decompress it in windows, and then open the decompressed package. There is mysql-connector-java-8.0.26.jar we need in. / usr/share/java. Take it out and transfer it to the lib folder of linux.

Then execute the following instructions to enter the database:

#service mysql start 
service mysql start

#Sign in
mysql -uroot -p
  • Another problem is encountered here. Running service mysql start reports an error. Failed to start mysql.service: Unit mysql.service not found
  • resolvent:
#Query whether mysql exists under / etc/init.d /. If there is no result, it means it does not exist
ll /etc/init.d/ | grep mysql

#Query the location of mysql.server
find / -name mysql.server

#Copy mysql.server, where / usr/local/mysql / is the author's MySQL installation directory
cp /usr/local/mysql/support-files/mysql.server /etc/init.d/mysql
  • After entering the database command line window, execute the database initialization command and set the access account and password. Note: {user} and {password} need to be replaced with specific database user name and password
mysql> CREATE DATABASE dolphinscheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
   mysql> GRANT ALL PRIVILEGES ON dolphinscheduler.* TO '{user}'@'%' IDENTIFIED BY '{password}';
   mysql> GRANT ALL PRIVILEGES ON dolphinscheduler.* TO '{user}'@'localhost' IDENTIFIED BY '{password}';
   mysql> flush privileges;

Note: if you need to view the user and password in mysql, you can use the following code

SELECT User, Host, Password FROM mysql.user;

Create tables and import basic data

Exit SQL first (press Ctrl+c)

Next, modify the following configuration in datasource.properties under the conf directory

cd /usr/local/dolphinscheduler/dolphinscheduler-bin

vi conf/datasource.properties
  • If MySQL is selected, please comment (use '#') out the PostgreSQL related configuration (the same is true for the reverse). You also need to manually add the [MySQL connector Java driver jar] package to the lib directory. Here, Download MySQL connector java-5.1.47.jar, and then correctly configure the database connection related information
  • Prompt: switch to the English input method, enter i to enter the modification state, press the direction key to change the cursor position, press Esc to exit the insertion state after completion, and then enter: wq save and exit q can be used to exit without saving.
  # postgre
  # spring.datasource.driver-class-name=org.postgresql.Driver
  # spring.datasource.url=jdbc:postgresql://localhost:5432/dolphinscheduler
  # mysql
  spring.datasource.driver-class-name=com.mysql.jdbc.Driver
  spring.datasource.url=jdbc:mysql://XXX: 3306 / Dolphin scheduler? Useunicode = true & characterencoding = UTF-8 & allowmultiqueries = true # if you need to modify the ip, you can modify the local localhost
  spring.datasource.username=xxx      # It needs to be modified to the above {user} value
  spring.datasource.password=xxx      # It needs to be modified to the above {password} value
  • First comment out the code related to PostgreSQL, and then pay attention to modifying xxx here
  • After modifying and saving, execute the script for creating tables and importing basic data under the script directory
sh script/create-dolphinscheduler.sh

Note: if the above script prompts an error of "/ bin/java: No such file or directory", please configure Java under / etc/profile_ Home and PATH variables

I happen to encounter similar problems here:

script/create-dolphinscheduler.sh: 37: script/create-dolphinscheduler.sh: /bin/java: not found

Let's follow the prompts to enter the profile file in the / etc folder for editing:

vi /etc/profile

Add the following code at the bottom:

JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
JRE_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre
PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
export JAVA_HOME JRE_HOME PATH CLASSPATH

Of course, it may need to be modified due to the location and version of each person's jdk. If you used npm to install the jdk, it is likely that the jdk will be installed in the / usr/lib/jvm directory The java file address we need is: / usr/lib/jvm/java-8-openjdk-amd64/bin/java (it may be slightly different here)

: wq save exit.

Then type the following code:

source /etc/profile   //Make changes effective immediately
echo $JAVA_HOME //View Java_ Value of home
javac -version

The first article makes the settings effective, and the second and third articles verify whether the configuration is successful.

5. Modify operating parameters

  • Modify the dolphin scheduler in the / usr / local / Dolphin scheduler / Dolphin scheduler bin / conf / env directory_ Env.sh environment variable (take / opt/soft as an example)
vi /usr/local/dolphinscheduler/dolphinscheduler-bin/conf/env/dolphinscheduler_env.sh

Note export SPARK_HOME1=/opt/soft/spark1

export DATAX_HOME=/opt/soft/datax/bin/datax plus. py

export HADOOP_HOME=/opt/soft/hadoop
export HADOOP_CONF_DIR=/opt/soft/hadoop/etc/hadoop
# export SPARK_HOME1=/opt/soft/spark1
export SPARK_HOME2=/opt/soft/spark2
export PYTHON_HOME=/opt/soft/python
export JAVA_HOME=/opt/soft/java
export HIVE_HOME=/opt/soft/hive
export FLINK_HOME=/opt/soft/flink
export DATAX_HOME=/opt/soft/datax/bin/datax.py
export PATH=$HADOOP_HOME/bin:$SPARK_HOME2/bin:$PYTHON_HOME:$JAVA_HOME/bin:$HIVE_HOME/bin:$FLINK_HOME/bin:$DATAX_HOME:$PATH

Note: this step is very important, such as JAVA_HOME and PATH must be configured, and those not used can be ignored or commented out; If the dolphin scheduler is not found_ Env.sh, run ls -a

  • Link the jdk to / usr/bin/java (still take JAVA_HOME=/opt/soft/java as an example)
sudo ln -s /opt/soft/java/bin/java /usr/bin/java
  • Modify the one click deployment configuration file conf / config / install_ For the parameters in config.conf, pay special attention to the configuration of the following parameters
vim /usr/local/dolphinscheduler/dolphinscheduler-bin/conf/config/install_config.conf
# Fill in mysql or postgresql here
dbtype="mysql"

# Database connection address
dbhost="localhost:3306"

# Database name
dbname="dolphinscheduler"

# The database user name needs to be modified to the specific value of {user} set above
username="xxx"    

# If there are special characters in the database password, please use \ escape. It needs to be modified to the specific value of {password} set above
password="xxx"

# Zookeeper address: localhost:2181. Remember to bring the 2181 port
zkQuorum="localhost:2181"

# The directory where the DS is installed, such as: / opt / soft / Dolphin scheduler, is different from the current directory
installPath="/opt/soft/dolphinscheduler"

# Which user deployment to use, use the user created in Section 3
deployUser="dolphinscheduler"

# Mail configuration, taking qq mailbox as an example
# Mail protocol
mailProtocol="SMTP"

# Mail service address
mailServerHost="smtp.qq.com"

# Mail service port
mailServerPort="25"

# mailSender and mailUser can be configured the same
# sender 
mailSender="xxx@qq.com"

# Sending user
mailUser="xxx@qq.com"

# Mailbox password
mailPassword="xxx"

# The mailbox of TLS protocol is set to true, otherwise it is set to false
starttlsEnable="true"

# The mailbox with SSL protocol enabled is configured as true, otherwise it is false. Note: starttlsEnable and sslEnable cannot be true at the same time
sslEnable="false"

# For the mail service address value, refer to mailServerHost above
sslTrust="smtp.qq.com"

# Where to upload resource files such as sql used in business can be set: HDFS,S3,NONE. If a single machine wants to use the local file system, please configure it as HDFS, because HDFS supports the local file system; If the resource upload function is not required, select NONE. One important point: using a local file system does not require hadoop deployment
resourceStorageType="HDFS"

# Here, take saving to the local file system as an example
# Note: if you want to upload to HDFS and HA is enabled for NameNode, you need to put the hadoop configuration files core-site.xml and hdfs-site.xml into the conf directory. In this example, you need to put them under / opt / Dolphin scheduler / conf and configure the namenode cluster name; If the NameNode is not ha, modify it to a specific ip or host name
defaultFS="file:///Data / Dolphin scheduler "#hdfs: / / {specific ip / hostname}: 8020

# If Yarn is not used, keep the following default values; if the ResourceManager is HA, configure it as the primary and standby ip or hostname of the ResourceManager node, such as "192.168.xx.xx,192.168.xx.xx"; if it is a single ResourceManager, configure yarnHaIps = ""
# Note: it depends on the tasks executed by yarn. In order to ensure the successful judgment of execution results, it is necessary to ensure the correct configuration of yarn information
yarnHaIps="192.168.xx.xx,192.168.xx.xx"

# If the ResourceManager is HA or does not use Yarn, keep the default value; if it is a single ResourceManager, configure the real ResourceManager host name or ip
singleYarnIp="yarnIp1"

# The resource upload root path supports HDFS and S3. Since HDFS supports the local file system, it is necessary to ensure that the local folder exists and has read-write permissions
resourceUploadPath="/data/dolphinscheduler"

# User with permission to create resourceUploadPath
hdfsRootUser="hdfs"
    
# Configure api server port
apiServerPort="12345"

# On which machines to deploy DS services, localhost is selected locally
ips="localhost"

# ssh port, default 22
sshPort="22"

# On which machine is the master service deployed
masters="localhost"

# The machine on which the worker service is deployed, and specify which worker group the worker belongs to. The default in the following example is the group name
workers="localhost:default"

# On which machine is the alarm service deployed
alertServer="localhost"

# On which machine is the backend api service deployed
apiServers="localhost"

Note: if you intend to use the content center function, execute the following command:

sudo mkdir /data/dolphinscheduler
sudo chown -R dolphinscheduler:dolphinscheduler /data/dolphinscheduler

6. One click deployment

  • Switch to the deployment user and execute the one click deployment script
sh install.sh 

Note: for the first deployment, the following information appears 5 times in step 3 of step 3 of the operation. This information can be ignored

sh: bin/dolphinscheduler-daemon.sh: No such file or directory
  • Run error: install.sh: 22: install.sh: Source: not found 1.replace file install.sh: 28: install.sh: [[: not found install.sh: 34: [: ==: unexpected operator 2.create directory 3.scp resources /usr/local/dolphinscheduler/dolphinscheduler-bin/script/scp-hosts.sh: 21: /usr/local/dolphinscheduler/dolphinscheduler-bin/script/scp-hosts.sh: source: not found /usr/local/dolphinscheduler/dolphinscheduler-bin/script/scp-hosts.sh: 24: /usr/local/dolphinscheduler/dolphinscheduler-bin/sc ript/scp-hosts.sh: [[: not found /usr/local/dolphinscheduler/dolphinscheduler-bin/script/scp-hosts.sh: 29: /usr/local/dolphinscheduler/dolphinscheduler-bin/script/scp-hosts.sh: Syntax error: "(" unexpected scp copy failed to exit

Reason: by default, dash is used as the shell, but sh is the command of bash shell.

Solution:

dpkg-reconfigure dash(need root Permissions)
Select in the interface no
 Rerun ls -l /bin/sh Post display/bin/sh -> bash
  • Run the sh command again
  • After the script is completed, the following five services will be started, using
jps

Command to check whether the service is started (jps comes with JDK)

  • If the display contains:
MasterServer         ----- master service
    WorkerServer         ----- worker service
    LoggerServer         ----- logger service
    ApiApplicationServer ----- api service
    AlertServer          ----- alert service

If the above services are started normally, the automatic deployment is successful

After the deployment is successful, you can view the logs, which are stored in the logs folder

/usr/dolphinscheduler/logs

logs/
    ├── dolphinscheduler-alert-server.log
    ├── dolphinscheduler-master-server.log
    |- dolphinscheduler-worker-server.log
    |- dolphinscheduler-api-server.log
    |- dolphinscheduler-logger-server.log

7. Log in to the system

Access front-end page address, interface ip (self modified) http://192.168.xx.xx:12345/dolphinscheduler

The deployment is completed and the interface is successfully entered.

Recommended reference documents:

Official website:

https://dolphinscheduler.apache.org/zh-cn/

Official website tutorial: https://www.bilibili.com/video/BV1d64y1s7eZ

You can refer to the original link here: https://blog.csdn.net/qq_50740678/article/details/120615253