Introduction to azkaban and azkaban deployment, principles and usage
Introduction to azkaban
Azkaban is a simple task scheduling service that consists of three parts: web server, dbserver, executor server.
Azkaban is a Java project from Linkedin Open Source, a batch workflow task scheduler. Used to run a set of work and processes in a specific order within a workflow.
Azkaban defines a KV file format to establish dependencies between tasks and provides an easy-to-use web user interface to maintain and track your workflow.
Project website: https://azkaban.github.io/
Functional features of Azkaban
1. Web User Interface
2. Easy upload workflow
3. Easy to set up relationships between tasks
4. Workflow Scheduling
5. Authentication/Authorization
6. Ability to kill and restart workflow
7. Modular and pluggable plug-in mechanisms
8. Project Workspace
9. Logging and auditing of workflows and tasks
azkaban installation deployment
Dead work:
Installation and deployment requires three components:
azkaban-executor-server-2.5.0.tar.gz
azkaban-sql-script-2.5.0.tar.gz
azkaban-web-server-2.5.0.tar.gz
Disk Sharing Connection Address: https://pan.baidu.com/s/1mMuIuVv9Ji6yO2A2b8Ibrg
Extraction Code: seld
[Note:] Deploy mysql service ahead of time, installation of mysql is not described here
Install components:
# Upload installation package wangting@ops01:/opt/software/azkaban >ll total 22612 -rw-r--r-- 1 root root 11157302 May 16 10:45 azkaban-executor-server-2.5.0.tar.gz -rw-r--r-- 1 root root 1928 May 16 10:45 azkaban-sql-script-2.5.0.tar.gz -rw-r--r-- 1 root root 11989669 May 16 10:45 azkaban-web-server-2.5.0.tar.gz # Create an application directory to help decompress multiple components in one management directory wangting@ops01:/opt/software/azkaban >mkdir /opt/module/azkaban wangting@ops01:/opt/software/azkaban >tar -xf azkaban-executor-server-2.5.0.tar.gz -C /opt/module/azkaban/ wangting@ops01:/opt/software/azkaban >tar -xf azkaban-sql-script-2.5.0.tar.gz -C /opt/module/azkaban/ wangting@ops01:/opt/software/azkaban >tar -xf azkaban-web-server-2.5.0.tar.gz -C /opt/module/azkaban/ wangting@ops01:/opt/software/azkaban >ls /opt/module/azkaban/ azkaban-2.5.0 azkaban-executor-2.5.0 azkaban-web-2.5.0 wangting@ops01:/opt/software/azkaban > wangting@ops01:/opt/software/azkaban >cd /opt/module/azkaban/ wangting@ops01:/opt/module/azkaban >ll total 12 drwxrwxr-x 2 wangting wangting 4096 May 16 10:48 azkaban-2.5.0 drwxrwxr-x 7 wangting wangting 4096 May 16 10:47 azkaban-executor-2.5.0 drwxrwxr-x 8 wangting wangting 4096 May 16 10:48 azkaban-web-2.5.0 # Rename, easy to manage and switch directories wangting@ops01:/opt/module/azkaban >mv azkaban-executor-2.5.0 executor wangting@ops01:/opt/module/azkaban >mv azkaban-web-2.5.0 server wangting@ops01:/opt/module/azkaban >ll total 12 drwxrwxr-x 2 wangting wangting 4096 May 16 10:48 azkaban-2.5.0 drwxrwxr-x 7 wangting wangting 4096 May 16 10:47 executor drwxrwxr-x 8 wangting wangting 4096 May 16 10:48 server wangting@ops01:/opt/module/azkaban > # sql file in azkaban-2.5.0 directory for subsequent Azkaban database project initialization wangting@ops01:/opt/module/azkaban >ls azkaban-2.5.0/ create.active_executing_flows.sql create.execution_flows.sql create.project_events.sql create.project_permissions.sql create.project_versions.sql create.triggers.sql update-all-sql-2.2.sql create.active_sla.sql create.execution_jobs.sql create.project_files.sql create.project_properties.sql create.properties.sql database.properties update.execution_logs.2.1.sql create-all-sql-2.5.0.sql create.execution_logs.sql create.project_flows.sql create.projects.sql create.schedules.sql update-all-sql-2.1.sql update.project_properties.2.1.sql # Check if the native IP and mysql services are working properly wangting@ops01:/opt/module/azkaban >ifconfig eth0 |grep "inet " inet 11.8.37.50 netmask 255.255.255.0 broadcast 11.8.37.255 wangting@ops01:/opt/module/azkaban >netstat -tnlpu|grep 3306 tcp6 0 0 :::3306 :::* LISTEN - # Log in to mysql wangting@ops01:/opt/module/azkaban >mysql -uroot -pwangting mysql: [Warning] Using a password on the command line interface can be insecure. Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 37069 Server version: 5.7.26 MySQL Community Server (GPL) Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. # Create azkaban Library mysql> create database azkaban; Query OK, 1 row affected (0.00 sec) mysql> use azkaban; Database changed mysql> show tables; Empty set (0.00 sec) # Initialization mysql> source /opt/module/azkaban/azkaban-2.5.0/create-all-sql-2.5.0.sql mysql> show tables; +------------------------+ | Tables_in_azkaban | +------------------------+ | active_executing_flows | | active_sla | | execution_flows | | execution_jobs | | execution_logs | | project_events | | project_files | | project_flows | | project_permissions | | project_properties | | project_versions | | projects | | properties | | schedules | | triggers | +------------------------+ 15 rows in set (0.00 sec) # Complete Exit mysql> exit Bye wangting@ops01:/opt/module/azkaban > wangting@ops01:/opt/module/azkaban >cd server wangting@ops01:/opt/module/azkaban/server >pwd /opt/module/azkaban/server # Generating authentication keystore jetty is the corresponding name in the configuration file wangting@ops01:/opt/module/azkaban/server >keytool -keystore keystore -alias jetty -genkey -keyalg RSA Enter keystore password: # The wangting password can be customized Re-enter new password: # wangting Repeat password What is your first and last name? # Enter [Unknown]: What is the name of your organizational unit? # Enter [Unknown]: What is the name of your organization? # Enter [Unknown]: What is the name of your City or Locality? # Enter [Unknown]: What is the name of your State or Province? # Enter [Unknown]: What is the two-letter country code for this unit? # Enter [Unknown]: Is CN=Unknown, OU=Unknown, O=Unknown, L=Unknown, ST=Unknown, C=Unknown correct? # y [no]: y Enter key password for <wangting> (RETURN if same as keystore password): # Enter Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore keystore -destkeystore keystore -deststoretype pkcs12". wangting@ops01:/opt/module/azkaban/server > # Check out the time zone wangting@ops01:/opt/module/azkaban/server >cat /etc/localtime TZifǚ^ ??ˊ??л>???-???????fp???|?? i ~?!I}"g? #)_$G %|&'e &??G (р~pCDTCSTTZif2 6C)????ǚ^???? ?????????ˊ????@????л>????{?????-????"????????????????fp??????????|?? i ~?!I}"g? #)_$G %|&'e &??G (рq?LMTCDTCST CST-8 # CST-8 last needed, if not CST-8 Eastern Eighth Time Zone needs to be adjusted wangting@ops01:/opt/module/azkaban/server >cd conf/ # Change server configuration wangting@ops01:/opt/module/azkaban/server/conf >ls azkaban.properties azkaban-users.xml wangting@ops01:/opt/module/azkaban/server/conf >vim azkaban.properties default.timezone.id=Asia/Shanghai # Change to Asia/Shanghai database.type=mysql mysql.port=3306 mysql.host=11.8.37.50 # IP to mysql deployed IP mysql.database=azkaban # The azkaban library you just created mysql.user=root mysql.password=wangting mysql.numconnections=100 # Azkaban Jetty server properties. jetty.maxThreads=25 jetty.ssl.port=8443 jetty.port=8081 jetty.keystore=keystore # keystore corresponding to keytool execution jetty.password=wangting # Change the password to the one you just set jetty.keypassword=wangting jetty.truststore=keystore jetty.trustpassword=wangting # Adding users is equivalent to registering wangting@ops01:/opt/module/azkaban/server/conf >vim azkaban-users.xml <azkaban-users> <user username="azkaban" password="azkaban" roles="admin" groups="azkaban" /> <user username="metrics" password="metrics" roles="metrics"/> <user username="wangting" password="wangting" roles="admin, metrics"/> # Customizable username password for interface login use <role name="admin" permissions="ADMIN" /> <role name="metrics" permissions="METRICS"/> </azkaban-users> # Change executor configuration wangting@ops01:/opt/module/azkaban/server/conf >cd /opt/module/azkaban/executor/conf/ wangting@ops01:/opt/module/azkaban/executor/conf >ls azkaban.private.properties azkaban.properties global.properties wangting@ops01:/opt/module/azkaban/executor/conf >vim azkaban.properties #Azkaban default.timezone.id=Asia/Shanghai # Change to Asia/Shanghai # Azkaban JobTypes Plugins azkaban.jobtype.plugin.dir=plugins/jobtypes #Loader for projects executor.global.properties=conf/global.properties azkaban.project.dir=projects database.type=mysql # Database Changes mysql.port=3306 mysql.host=11.8.37.50 mysql.database=azkaban mysql.user=root mysql.password=wangting mysql.numconnections=100 # Azkaban Executor settings executor.maxThreads=50 executor.port=12321 executor.flow.threads=30
Start the service:
wangting@ops01:/opt/module/azkaban/executor/conf >cd /opt/module/azkaban/server/ wangting@ops01:/opt/module/azkaban/server >bin/azkaban-web-start.sh Using Hadoop from /opt/module/hadoop-3.1.3 Using Hive from /opt/module/hive bin/.. 2021/05/16 11:26:42.425 +0800 INFO [log] [Azkaban] Started SslSocketConnector@0.0.0.0:8443 2021/05/16 11:26:42.425 +0800 INFO [AzkabanWebServer] [Azkaban] Server running on ssl port 8443. wangting@ops01:/opt/module/azkaban/server >cd /opt/module/azkaban/executor/ wangting@ops01:/opt/module/azkaban/executor >bin/azkaban-executor-start.sh wangting@ops01:/opt/module/azkaban/server >cd /opt/module/azkaban/executor/ wangting@ops01:/opt/module/azkaban/executor >bin/azkaban-executor-start.sh Using Hadoop from /opt/module/hadoop-3.1.3 Using Hive from /opt/module/hive bin/.. Starting AzkabanExecutorServer on port 12321 ... 2021/05/16 11:29:20.076 +0800 INFO [log] [Azkaban] Started SocketConnector@0.0.0.0:12321 2021/05/16 11:29:20.076 +0800 INFO [AzkabanExecutorServer] [Azkaban] Azkaban Executor Server started on port 12321
Page Access
Successfully logged in and deployment process completed.
Introduction to the use of azkaban
projects: The most important part is to create a project where all flows will run.
scheduling: Display timed tasks
executing: Display currently running tasks
history:Show historical running tasks
Single independent task
1. Create a project
Create a project
project_1
Descriptive information
2. Define a job
How a task executes and what it does is defined in a job file
# Create a new command locally. Job file, with no spaces at the end of the contents, as follows: # command.job type=command command=mkdir /opt/module/ztdata_0516
3. Package the job definition file into a zip package
Edit the command. After the job file, use the compression software to package it into a zip file, such as command.zip
4. upload task compressed package to project
After uploading, if you want to see what job's content is, you can view the parsed task content in job command
5. View and execute tasks
Click on the command task in Flows to get to the specific interface of the task, Execute Flow can execute the task
[Note:] Because of the interface operation, the related files can be edited directly on the local windows computer, created and packaged in zip.
6. Historical Task Records
After the task executes, you can view the task history in the history;
7. Verify execution results
wangting@ops01:/opt/module >ll total 52 drwxrwxr-x 5 wangting wangting 4096 May 16 10:51 azkaban drwxrwxr-x 2 wangting wangting 4096 Apr 4 11:01 datas drwxr-xr-x 12 wangting wangting 4096 Apr 24 16:37 flume -rw-rw-r-- 1 wangting wangting 30 Apr 25 11:33 group.log drwxr-xr-x 12 wangting wangting 4096 Mar 12 11:38 hadoop-3.1.3 drwxrwxr-x 8 wangting wangting 4096 May 10 11:55 hbase drwxrwxr-x 11 wangting wangting 4096 Apr 2 15:14 hive drwxr-xr-x 7 wangting wangting 4096 Apr 29 11:07 kafka drwxr-xr-x 5 wangting wangting 4096 Jun 27 2018 phoenix drwxrwxr-x 3 wangting wangting 4096 Apr 10 16:25 tez drwxrwxr-x 5 wangting wangting 4096 Apr 2 15:03 tez-0.9.2_bak0410 drwxr-xr-x 8 wangting wangting 4096 Mar 25 11:02 zookeeper-3.5.7 drwxrwxr-x 2 wangting wangting 4096 May 16 11:51 ztdata_0516 wangting@ops01:/opt/module >
When the task is complete, verify that ztdata_has been successfully created under / opt/module/directory 0516 New directory, indicating that the task was successfully suspended and executed
Multiple Task Workflow
[Note:] Follow-up experiments are no longer screenshots, process example 1.
Create project
Create a project
project_2
Descriptive information
Define job tasks
Create 2 job files locally
one.job
# one.job type=command command=mkdir /opt/module/one
two.job
# two.job type=command dependencies=one command=touch /opt/module/one/two.txt
[Note:] dependencies=one means two, a job task. Depending on one, which defines this parameter, it means that they execute first and then two needs one to complete before executing
Package the job definition file into a zip package
zip file name is arbitrary
upload task compressed package to project
Home page, click Projects paging bar above to open project_2 items, Upload in upper right corner; Then upload the zip file
Execute Tasks
Click Flows, click two for the main task, and execute after entering
Verification results
wangting@ops01:/opt/module >ll total 56 drwxrwxr-x 5 wangting wangting 4096 May 16 10:51 azkaban drwxrwxr-x 2 wangting wangting 4096 Apr 4 11:01 datas drwxr-xr-x 12 wangting wangting 4096 Apr 24 16:37 flume -rw-rw-r-- 1 wangting wangting 30 Apr 25 11:33 group.log drwxr-xr-x 12 wangting wangting 4096 Mar 12 11:38 hadoop-3.1.3 drwxrwxr-x 8 wangting wangting 4096 May 10 11:55 hbase drwxrwxr-x 11 wangting wangting 4096 Apr 2 15:14 hive drwxr-xr-x 7 wangting wangting 4096 Apr 29 11:07 kafka drwxrwxr-x 2 wangting wangting 4096 May 16 12:34 one drwxr-xr-x 5 wangting wangting 4096 Jun 27 2018 phoenix drwxrwxr-x 3 wangting wangting 4096 Apr 10 16:25 tez drwxrwxr-x 5 wangting wangting 4096 Apr 2 15:03 tez-0.9.2_bak0410 drwxr-xr-x 8 wangting wangting 4096 Mar 25 11:02 zookeeper-3.5.7 drwxrwxr-x 2 wangting wangting 4096 May 16 11:51 ztdata_0516 wangting@ops01:/opt/module >cd one/ wangting@ops01:/opt/module/one >ls two.txt wangting@ops01:/opt/module/one >
When the task is completed, verify that: under the / opt/module/directory, one new directory was successfully created, indicating that task 1 was successfully suspended and executed
Enter one directory and successfully view two.txt file indicating that Task 2 was successfully suspended and executed
Call a task script to execute a task
Writing a simulation process on the server is complex, such as calling scripts to execute hive, hdfs, and so on, business script tasks:
/opt/module/test >vim test_azkaban.sh
wangting@ops01:/opt/module/test >vim test_azkaban.sh #!/bin/bash echo "123" echo "123123" echo "123123123" ls -l /opt/module/ >> /opt/module/test/shell_log_0516.log hdfs dfs -ls / >> /opt/module/test/shell_log_0516.log NOW=`date|awk -F" " '{print $4}'` echo "current time: $NOW" wangting@ops01:/opt/module/test >chmod +x test_azkaban.sh
Create project
Create a project
project_3
Descriptive information
Define job tasks
# run_bash.job type=command command=bash /opt/module/test/test_azkaban.sh
Package the job definition file into a zip package
Same case as above
upload task compressed package to project
Home page, click Projects paging bar above to open project_3 Projects, Upload in upper right corner; Then upload the zip file
Execute Tasks
Click Flows, click run_of the main task Bash, execute after entering
Verification results
wangting@ops01:/opt/module/test >ll total 8 -rw-rw-r-- 1 wangting wangting 1801 May 16 12:49 shell_log_0516.log -rwxrwxr-x 1 wangting wangting 226 May 16 12:44 test_azkaban.sh wangting@ops01:/opt/module/test > # See if the output has traversed directories and the contents of the hdfs root directory wangting@ops01:/opt/module/test >cat shell_log_0516.log total 60 drwxrwxr-x 5 wangting wangting 4096 May 16 10:51 azkaban drwxrwxr-x 2 wangting wangting 4096 Apr 4 11:01 datas drwxr-xr-x 12 wangting wangting 4096 Apr 24 16:37 flume -rw-rw-r-- 1 wangting wangting 30 Apr 25 11:33 group.log drwxr-xr-x 12 wangting wangting 4096 Mar 12 11:38 hadoop-3.1.3 drwxrwxr-x 8 wangting wangting 4096 May 10 11:55 hbase drwxrwxr-x 11 wangting wangting 4096 Apr 2 15:14 hive drwxr-xr-x 7 wangting wangting 4096 Apr 29 11:07 kafka drwxrwxr-x 2 wangting wangting 4096 May 16 12:34 one drwxr-xr-x 5 wangting wangting 4096 Jun 27 2018 phoenix drwxrwxr-x 2 wangting wangting 4096 May 16 12:49 test drwxrwxr-x 3 wangting wangting 4096 Apr 10 16:25 tez drwxrwxr-x 5 wangting wangting 4096 Apr 2 15:03 tez-0.9.2_bak0410 drwxr-xr-x 8 wangting wangting 4096 Mar 25 11:02 zookeeper-3.5.7 drwxrwxr-x 2 wangting wangting 4096 May 16 11:51 ztdata_0516 2021-05-16 12:49:16,801 INFO [main] Configuration.deprecation (Configuration.java:logDeprecation(1395)) - No unit for dfs.client.datanode-restart.timeout(30) assuming SECONDS Found 10 items drwxr-xr-x - wangting supergroup 0 2021-03-17 11:44 /20210317 drwxr-xr-x - wangting supergroup 0 2021-03-19 10:51 /20210319 drwxr-xr-x - wangting supergroup 0 2021-04-24 17:05 /flume -rw-r--r-- 3 wangting supergroup 338075860 2021-03-12 11:50 /hadoop-3.1.3.tar.gz drwxr-xr-x - wangting supergroup 0 2021-05-13 15:31 /hbase drwxr-xr-x - wangting supergroup 0 2021-04-04 11:07 /test.db drwxr-xr-x - wangting supergroup 0 2021-03-19 11:14 /testgetmerge drwxr-xr-x - wangting supergroup 0 2021-04-10 16:23 /tez drwx------ - wangting supergroup 0 2021-04-02 15:14 /tmp drwxr-xr-x - wangting supergroup 0 2021-04-02 15:25 /user wangting@ops01:/opt/module/test >
When the task is complete, verify that the shell_has been successfully created in the / opt/module/test directory Log_ 0516.log file describing task run_bash was successfully suspended and executed