Introduction to azkaban and azkaban deployment, principles and usage

Posted by ravi181229 on Thu, 10 Feb 2022 18:43:13 +0100

Introduction to azkaban and azkaban deployment, principles and usage

Introduction to azkaban

Azkaban is a simple task scheduling service that consists of three parts: web server, dbserver, executor server.
Azkaban is a Java project from Linkedin Open Source, a batch workflow task scheduler. Used to run a set of work and processes in a specific order within a workflow.
Azkaban defines a KV file format to establish dependencies between tasks and provides an easy-to-use web user interface to maintain and track your workflow.

Project website: https://azkaban.github.io/

Functional features of Azkaban

1. Web User Interface
2. Easy upload workflow
3. Easy to set up relationships between tasks
4. Workflow Scheduling
5. Authentication/Authorization
6. Ability to kill and restart workflow
7. Modular and pluggable plug-in mechanisms
8. Project Workspace
9. Logging and auditing of workflows and tasks

azkaban installation deployment

Dead work:

Installation and deployment requires three components:

azkaban-executor-server-2.5.0.tar.gz
azkaban-sql-script-2.5.0.tar.gz
azkaban-web-server-2.5.0.tar.gz

Disk Sharing Connection Address: https://pan.baidu.com/s/1mMuIuVv9Ji6yO2A2b8Ibrg
Extraction Code: seld
[Note:] Deploy mysql service ahead of time, installation of mysql is not described here

Install components:

# Upload installation package
wangting@ops01:/opt/software/azkaban >ll
total 22612
-rw-r--r-- 1 root root 11157302 May 16 10:45 azkaban-executor-server-2.5.0.tar.gz
-rw-r--r-- 1 root root     1928 May 16 10:45 azkaban-sql-script-2.5.0.tar.gz
-rw-r--r-- 1 root root 11989669 May 16 10:45 azkaban-web-server-2.5.0.tar.gz
# Create an application directory to help decompress multiple components in one management directory
wangting@ops01:/opt/software/azkaban >mkdir /opt/module/azkaban
wangting@ops01:/opt/software/azkaban >tar -xf azkaban-executor-server-2.5.0.tar.gz -C /opt/module/azkaban/
wangting@ops01:/opt/software/azkaban >tar -xf azkaban-sql-script-2.5.0.tar.gz -C /opt/module/azkaban/
wangting@ops01:/opt/software/azkaban >tar -xf azkaban-web-server-2.5.0.tar.gz -C /opt/module/azkaban/
wangting@ops01:/opt/software/azkaban >ls /opt/module/azkaban/
azkaban-2.5.0  azkaban-executor-2.5.0  azkaban-web-2.5.0
wangting@ops01:/opt/software/azkaban >
wangting@ops01:/opt/software/azkaban >cd /opt/module/azkaban/
wangting@ops01:/opt/module/azkaban >ll
total 12
drwxrwxr-x 2 wangting wangting 4096 May 16 10:48 azkaban-2.5.0
drwxrwxr-x 7 wangting wangting 4096 May 16 10:47 azkaban-executor-2.5.0
drwxrwxr-x 8 wangting wangting 4096 May 16 10:48 azkaban-web-2.5.0
# Rename, easy to manage and switch directories
wangting@ops01:/opt/module/azkaban >mv azkaban-executor-2.5.0 executor
wangting@ops01:/opt/module/azkaban >mv azkaban-web-2.5.0 server
wangting@ops01:/opt/module/azkaban >ll
total 12
drwxrwxr-x 2 wangting wangting 4096 May 16 10:48 azkaban-2.5.0
drwxrwxr-x 7 wangting wangting 4096 May 16 10:47 executor
drwxrwxr-x 8 wangting wangting 4096 May 16 10:48 server
wangting@ops01:/opt/module/azkaban >
# sql file in azkaban-2.5.0 directory for subsequent Azkaban database project initialization
wangting@ops01:/opt/module/azkaban >ls azkaban-2.5.0/
create.active_executing_flows.sql  create.execution_flows.sql  create.project_events.sql  create.project_permissions.sql  create.project_versions.sql  create.triggers.sql     update-all-sql-2.2.sql
create.active_sla.sql              create.execution_jobs.sql   create.project_files.sql   create.project_properties.sql   create.properties.sql        database.properties     update.execution_logs.2.1.sql
create-all-sql-2.5.0.sql           create.execution_logs.sql   create.project_flows.sql   create.projects.sql             create.schedules.sql         update-all-sql-2.1.sql  update.project_properties.2.1.sql
# Check if the native IP and mysql services are working properly
wangting@ops01:/opt/module/azkaban >ifconfig eth0 |grep "inet "
        inet 11.8.37.50  netmask 255.255.255.0  broadcast 11.8.37.255
wangting@ops01:/opt/module/azkaban >netstat -tnlpu|grep 3306
tcp6       0      0 :::3306                 :::*                    LISTEN      -                   
# Log in to mysql
wangting@ops01:/opt/module/azkaban >mysql -uroot -pwangting
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 37069
Server version: 5.7.26 MySQL Community Server (GPL)

Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
# Create azkaban Library
mysql> create database azkaban;
Query OK, 1 row affected (0.00 sec)

mysql> use azkaban;
Database changed
mysql> show tables;
Empty set (0.00 sec)
# Initialization
mysql> source /opt/module/azkaban/azkaban-2.5.0/create-all-sql-2.5.0.sql
mysql> show tables;
+------------------------+
| Tables_in_azkaban      |
+------------------------+
| active_executing_flows |
| active_sla             |
| execution_flows        |
| execution_jobs         |
| execution_logs         |
| project_events         |
| project_files          |
| project_flows          |
| project_permissions    |
| project_properties     |
| project_versions       |
| projects               |
| properties             |
| schedules              |
| triggers               |
+------------------------+
15 rows in set (0.00 sec)
# Complete Exit
mysql> exit
Bye
wangting@ops01:/opt/module/azkaban >
wangting@ops01:/opt/module/azkaban >cd server
wangting@ops01:/opt/module/azkaban/server >pwd
/opt/module/azkaban/server

# Generating authentication keystore jetty is the corresponding name in the configuration file
wangting@ops01:/opt/module/azkaban/server >keytool -keystore keystore -alias jetty -genkey -keyalg RSA
Enter keystore password:  			# The wangting password can be customized
Re-enter new password: 				# wangting 	  Repeat password
What is your first and last name?		# Enter
  [Unknown]:  
What is the name of your organizational unit?	# Enter
  [Unknown]:  
What is the name of your organization?		# Enter
  [Unknown]:  
What is the name of your City or Locality?	# Enter
  [Unknown]:  
What is the name of your State or Province?	# Enter
  [Unknown]:  
What is the two-letter country code for this unit?	# Enter
  [Unknown]:  
Is CN=Unknown, OU=Unknown, O=Unknown, L=Unknown, ST=Unknown, C=Unknown correct?		# y
  [no]:  y

Enter key password for <wangting>
	(RETURN if same as keystore password):  	# Enter 

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore keystore -destkeystore keystore -deststoretype pkcs12".
wangting@ops01:/opt/module/azkaban/server >
# Check out the time zone
wangting@ops01:/opt/module/azkaban/server >cat /etc/localtime 
TZifǚ^	??ˊ??л>???-???????fp???|?? i ~?!I}"g? #)_$G %|&'e &??G (р~pCDTCSTTZif2 
                                                                      6C)????ǚ^????	?????????ˊ????@????л>????{?????-????"????????????????fp??????????|?? i ~?!I}"g? #)_$G %|&'e &??G (рq?LMTCDTCST
CST-8

# CST-8 last needed, if not CST-8 Eastern Eighth Time Zone needs to be adjusted

wangting@ops01:/opt/module/azkaban/server >cd conf/
# Change server configuration
wangting@ops01:/opt/module/azkaban/server/conf >ls
azkaban.properties  azkaban-users.xml
wangting@ops01:/opt/module/azkaban/server/conf >vim azkaban.properties 
default.timezone.id=Asia/Shanghai			# Change to Asia/Shanghai

database.type=mysql
mysql.port=3306
mysql.host=11.8.37.50						# IP to mysql deployed IP
mysql.database=azkaban						# The azkaban library you just created
mysql.user=root
mysql.password=wangting
mysql.numconnections=100

# Azkaban Jetty server properties.
jetty.maxThreads=25
jetty.ssl.port=8443
jetty.port=8081
jetty.keystore=keystore					   # keystore corresponding to keytool execution
jetty.password=wangting					   # Change the password to the one you just set
jetty.keypassword=wangting
jetty.truststore=keystore
jetty.trustpassword=wangting

# Adding users is equivalent to registering
wangting@ops01:/opt/module/azkaban/server/conf >vim azkaban-users.xml 

<azkaban-users>
        <user username="azkaban" password="azkaban" roles="admin" groups="azkaban" />
        <user username="metrics" password="metrics" roles="metrics"/>
        <user username="wangting" password="wangting" roles="admin, metrics"/>			# Customizable username password for interface login use

        <role name="admin" permissions="ADMIN" />
        <role name="metrics" permissions="METRICS"/>
</azkaban-users>

# Change executor configuration
wangting@ops01:/opt/module/azkaban/server/conf >cd /opt/module/azkaban/executor/conf/
wangting@ops01:/opt/module/azkaban/executor/conf >ls
azkaban.private.properties  azkaban.properties  global.properties
wangting@ops01:/opt/module/azkaban/executor/conf >vim azkaban.properties 

#Azkaban
default.timezone.id=Asia/Shanghai				# Change to Asia/Shanghai

# Azkaban JobTypes Plugins
azkaban.jobtype.plugin.dir=plugins/jobtypes

#Loader for projects
executor.global.properties=conf/global.properties
azkaban.project.dir=projects

database.type=mysql								# Database Changes
mysql.port=3306
mysql.host=11.8.37.50
mysql.database=azkaban
mysql.user=root
mysql.password=wangting
mysql.numconnections=100

# Azkaban Executor settings
executor.maxThreads=50
executor.port=12321
executor.flow.threads=30

Start the service:

wangting@ops01:/opt/module/azkaban/executor/conf >cd /opt/module/azkaban/server/
wangting@ops01:/opt/module/azkaban/server >bin/azkaban-web-start.sh 
Using Hadoop from /opt/module/hadoop-3.1.3
Using Hive from /opt/module/hive
bin/..
2021/05/16 11:26:42.425 +0800 INFO [log] [Azkaban] Started SslSocketConnector@0.0.0.0:8443
2021/05/16 11:26:42.425 +0800 INFO [AzkabanWebServer] [Azkaban] Server running on ssl port 8443.

wangting@ops01:/opt/module/azkaban/server >cd /opt/module/azkaban/executor/
wangting@ops01:/opt/module/azkaban/executor >bin/azkaban-executor-start.sh 
wangting@ops01:/opt/module/azkaban/server >cd /opt/module/azkaban/executor/
wangting@ops01:/opt/module/azkaban/executor >bin/azkaban-executor-start.sh 
Using Hadoop from /opt/module/hadoop-3.1.3
Using Hive from /opt/module/hive
bin/..
Starting AzkabanExecutorServer on port 12321 ...
2021/05/16 11:29:20.076 +0800 INFO [log] [Azkaban] Started SocketConnector@0.0.0.0:12321
2021/05/16 11:29:20.076 +0800 INFO [AzkabanExecutorServer] [Azkaban] Azkaban Executor Server started on port 12321

Page Access

Successfully logged in and deployment process completed.

Introduction to the use of azkaban

projects: The most important part is to create a project where all flows will run.

scheduling: Display timed tasks

executing: Display currently running tasks

history:Show historical running tasks

Single independent task

1. Create a project

Create a project

​ project_1

Descriptive information

2. Define a job

How a task executes and what it does is defined in a job file

# Create a new command locally. Job file, with no spaces at the end of the contents, as follows:

# command.job
type=command
command=mkdir /opt/module/ztdata_0516


3. Package the job definition file into a zip package

Edit the command. After the job file, use the compression software to package it into a zip file, such as command.zip

4. upload task compressed package to project

After uploading, if you want to see what job's content is, you can view the parsed task content in job command

5. View and execute tasks

Click on the command task in Flows to get to the specific interface of the task, Execute Flow can execute the task

[Note:] Because of the interface operation, the related files can be edited directly on the local windows computer, created and packaged in zip.

6. Historical Task Records

After the task executes, you can view the task history in the history;

7. Verify execution results

wangting@ops01:/opt/module >ll
total 52
drwxrwxr-x  5 wangting wangting 4096 May 16 10:51 azkaban
drwxrwxr-x  2 wangting wangting 4096 Apr  4 11:01 datas
drwxr-xr-x 12 wangting wangting 4096 Apr 24 16:37 flume
-rw-rw-r--  1 wangting wangting   30 Apr 25 11:33 group.log
drwxr-xr-x 12 wangting wangting 4096 Mar 12 11:38 hadoop-3.1.3
drwxrwxr-x  8 wangting wangting 4096 May 10 11:55 hbase
drwxrwxr-x 11 wangting wangting 4096 Apr  2 15:14 hive
drwxr-xr-x  7 wangting wangting 4096 Apr 29 11:07 kafka
drwxr-xr-x  5 wangting wangting 4096 Jun 27  2018 phoenix
drwxrwxr-x  3 wangting wangting 4096 Apr 10 16:25 tez
drwxrwxr-x  5 wangting wangting 4096 Apr  2 15:03 tez-0.9.2_bak0410
drwxr-xr-x  8 wangting wangting 4096 Mar 25 11:02 zookeeper-3.5.7
drwxrwxr-x  2 wangting wangting 4096 May 16 11:51 ztdata_0516
wangting@ops01:/opt/module >

When the task is complete, verify that ztdata_has been successfully created under / opt/module/directory 0516 New directory, indicating that the task was successfully suspended and executed

Multiple Task Workflow

[Note:] Follow-up experiments are no longer screenshots, process example 1.

Create project

Create a project

​ project_2

Descriptive information

Define job tasks

Create 2 job files locally

one.job

# one.job
type=command
command=mkdir /opt/module/one

two.job

# two.job
type=command
dependencies=one
command=touch /opt/module/one/two.txt

[Note:] dependencies=one means two, a job task. Depending on one, which defines this parameter, it means that they execute first and then two needs one to complete before executing

Package the job definition file into a zip package

zip file name is arbitrary

upload task compressed package to project

Home page, click Projects paging bar above to open project_2 items, Upload in upper right corner; Then upload the zip file

Execute Tasks

Click Flows, click two for the main task, and execute after entering

Verification results

wangting@ops01:/opt/module >ll
total 56
drwxrwxr-x  5 wangting wangting 4096 May 16 10:51 azkaban
drwxrwxr-x  2 wangting wangting 4096 Apr  4 11:01 datas
drwxr-xr-x 12 wangting wangting 4096 Apr 24 16:37 flume
-rw-rw-r--  1 wangting wangting   30 Apr 25 11:33 group.log
drwxr-xr-x 12 wangting wangting 4096 Mar 12 11:38 hadoop-3.1.3
drwxrwxr-x  8 wangting wangting 4096 May 10 11:55 hbase
drwxrwxr-x 11 wangting wangting 4096 Apr  2 15:14 hive
drwxr-xr-x  7 wangting wangting 4096 Apr 29 11:07 kafka
drwxrwxr-x  2 wangting wangting 4096 May 16 12:34 one
drwxr-xr-x  5 wangting wangting 4096 Jun 27  2018 phoenix
drwxrwxr-x  3 wangting wangting 4096 Apr 10 16:25 tez
drwxrwxr-x  5 wangting wangting 4096 Apr  2 15:03 tez-0.9.2_bak0410
drwxr-xr-x  8 wangting wangting 4096 Mar 25 11:02 zookeeper-3.5.7
drwxrwxr-x  2 wangting wangting 4096 May 16 11:51 ztdata_0516
wangting@ops01:/opt/module >cd one/
wangting@ops01:/opt/module/one >ls
two.txt
wangting@ops01:/opt/module/one >

When the task is completed, verify that: under the / opt/module/directory, one new directory was successfully created, indicating that task 1 was successfully suspended and executed

Enter one directory and successfully view two.txt file indicating that Task 2 was successfully suspended and executed

Call a task script to execute a task

Writing a simulation process on the server is complex, such as calling scripts to execute hive, hdfs, and so on, business script tasks:

/opt/module/test >vim test_azkaban.sh

wangting@ops01:/opt/module/test >vim test_azkaban.sh 

#!/bin/bash
echo "123"
echo "123123"
echo "123123123"
ls -l /opt/module/ >> /opt/module/test/shell_log_0516.log
hdfs dfs -ls / >> /opt/module/test/shell_log_0516.log
NOW=`date|awk -F" " '{print $4}'`
echo "current time: $NOW"

wangting@ops01:/opt/module/test >chmod +x test_azkaban.sh 

Create project

Create a project

​ project_3

Descriptive information

Define job tasks

# run_bash.job
type=command
command=bash /opt/module/test/test_azkaban.sh

Package the job definition file into a zip package

Same case as above

upload task compressed package to project

Home page, click Projects paging bar above to open project_3 Projects, Upload in upper right corner; Then upload the zip file

Execute Tasks

Click Flows, click run_of the main task Bash, execute after entering

Verification results

wangting@ops01:/opt/module/test >ll
total 8
-rw-rw-r-- 1 wangting wangting 1801 May 16 12:49 shell_log_0516.log
-rwxrwxr-x 1 wangting wangting  226 May 16 12:44 test_azkaban.sh
wangting@ops01:/opt/module/test >
# See if the output has traversed directories and the contents of the hdfs root directory
wangting@ops01:/opt/module/test >cat shell_log_0516.log 
total 60
drwxrwxr-x  5 wangting wangting 4096 May 16 10:51 azkaban
drwxrwxr-x  2 wangting wangting 4096 Apr  4 11:01 datas
drwxr-xr-x 12 wangting wangting 4096 Apr 24 16:37 flume
-rw-rw-r--  1 wangting wangting   30 Apr 25 11:33 group.log
drwxr-xr-x 12 wangting wangting 4096 Mar 12 11:38 hadoop-3.1.3
drwxrwxr-x  8 wangting wangting 4096 May 10 11:55 hbase
drwxrwxr-x 11 wangting wangting 4096 Apr  2 15:14 hive
drwxr-xr-x  7 wangting wangting 4096 Apr 29 11:07 kafka
drwxrwxr-x  2 wangting wangting 4096 May 16 12:34 one
drwxr-xr-x  5 wangting wangting 4096 Jun 27  2018 phoenix
drwxrwxr-x  2 wangting wangting 4096 May 16 12:49 test
drwxrwxr-x  3 wangting wangting 4096 Apr 10 16:25 tez
drwxrwxr-x  5 wangting wangting 4096 Apr  2 15:03 tez-0.9.2_bak0410
drwxr-xr-x  8 wangting wangting 4096 Mar 25 11:02 zookeeper-3.5.7
drwxrwxr-x  2 wangting wangting 4096 May 16 11:51 ztdata_0516
2021-05-16 12:49:16,801 INFO  [main] Configuration.deprecation (Configuration.java:logDeprecation(1395)) - No unit for dfs.client.datanode-restart.timeout(30) assuming SECONDS
Found 10 items
drwxr-xr-x   - wangting supergroup          0 2021-03-17 11:44 /20210317
drwxr-xr-x   - wangting supergroup          0 2021-03-19 10:51 /20210319
drwxr-xr-x   - wangting supergroup          0 2021-04-24 17:05 /flume
-rw-r--r--   3 wangting supergroup  338075860 2021-03-12 11:50 /hadoop-3.1.3.tar.gz
drwxr-xr-x   - wangting supergroup          0 2021-05-13 15:31 /hbase
drwxr-xr-x   - wangting supergroup          0 2021-04-04 11:07 /test.db
drwxr-xr-x   - wangting supergroup          0 2021-03-19 11:14 /testgetmerge
drwxr-xr-x   - wangting supergroup          0 2021-04-10 16:23 /tez
drwx------   - wangting supergroup          0 2021-04-02 15:14 /tmp
drwxr-xr-x   - wangting supergroup          0 2021-04-02 15:25 /user
wangting@ops01:/opt/module/test >

When the task is complete, verify that the shell_has been successfully created in the / opt/module/test directory Log_ 0516.log file describing task run_bash was successfully suspended and executed

Topics: Operation & Maintenance Big Data Azkaban workflow