catalogue
2, System architecture of azkaban
3, Installation mode of azkaban
3.1. 1 Introduction to solo server
3.2 installation method of multi exec server
4.1,Flow1.0 version of job stream
4.1. 2 case demonstration 1: print hello world
4.1. 3 case demonstration 2 calling shell instruction
4.1. 4 case demonstration 2 execution mr procedure
4.1. 5 case demonstration 4 workflow demonstration
4.1.6 azkaban scheduling hive script
4.1.7 azkaban scheduled scheduling tasks
4.2, flow2. flow stream of version 0
1, Introduction to azkaban
1) Official website
https://azkaban.github.io/
Azkaban yes LinkedIn Batch workflow task scheduler created to run Hadoop Mission. Azkaban solves the sorting problem through work dependencies and provides an easy-to-use web User interface to maintain and track your workflow.
2) Generation background
1. A complete big data analysis system is usually composed of a large number of task units: shell Script program, mapreduce Procedures hive Scripts spark Procedures, etc. 2. There are time sequence and before and after dependencies among task units:Priority relationship, dependency relationship, and scheduled execution. 3. In order to organize such a complex execution plan well, a workflow scheduling system is needed to schedule execution.
3) Characteristics of azkaban
compatible Hadoop Any version of Easy to use web UI ordinary web and http Workflow upload Project workspace Scheduled workflow Modularity and pluginable Authentication and authorization Track user actions Email reminder failed and succeeded SLA Alarm and automatic kill Retry the failed job
4) Comparison between azkaban and oozie
The two are roughly the same in function, but Oozie The bottom layer is submitted Hadoop Spark The job is through org.apache.hadoop The encapsulated interface is submitted, and Azkaban It can be operated directly shell sentence. It is possible in terms of security Oozie It would be better. Workflow definition: Oozie Yes xml Defined and Azkaban by properties To define. Deployment process: Oozie The deployment of is relatively difficult, and it is from Yarn Pull up the task log. Azkaban If a task fails, as long as the process executes effectively, the task will be executed successfully BUG,however Oozie It can effectively detect the success and failure of tasks. Operation workflow: Azkaban use Web Operation. Oozie support Web,RestApi,Java API Operation. Permission control: Oozie Basically no authority control, Azkaban It has perfect permission control for users to read and write workflow. Oozie of action Mainly run in hadoop Zhonger Azkaban of actions Run in Azkaban In the server. record workflow Status of: Azkaban Will be executing workflow The state is saved in memory, Oozie Save it in Mysql Yes. Failure: Azkaban All workflows will be lost, but Oozie You can continue to run in a failed workflow
5) Common dispatching system
Simple task scheduling: direct use linux of crontab To define shell and python Script implementation Off the shelf open source task scheduling: oozie,azkaban and airflow etc. Complex task scheduling: self-developed scheduling platform
2, System architecture of azkaban
azkaban It consists of three components 1. web server: Provided webui Interface to receive incoming from the client job Work, and to exec server Distribute job 2. exec server : receive web server Distribute the job and execute it. 3. mysql : For management Web and Exec Data sharing and partial state synchronization between.
3, Installation mode of azkaban
The three methods are source code installation solo Mode multi exec server pattern 1.Source installation mode, reference documentation. 2.solo pattern: Stand alone mode refers to azkaban All processes are on one machine, and there is only one exec server 3.multi exec server Mode: refers to exec server There are multiple, distributed on different machine nodes
3.1 Solo Server installation
3.1. 1 Introduction to solo server
This Solo Server service is a stand-alone version of azkaban, that is, a single instance. It is simple to install and easy to learn. His advantages are as follows:
- Simple installation:unwanted mysql Instance, which is built-in h2 For storage. - Easy to start: web server and executor server All run in the same process. - Fully functional: it contains all azkaban Characteristics of. You can use azkaban Use this common method and install plug-ins for it.
3.1. 2 installation steps
1) Upload, unzip, and rename
[root@xxx01 ~]# tar -zxvf azkaban-solo-server-0.1.0-SNAPSHOT.tar.gz -C /usr/local/ [root@xxx01 ~]# cd /usr/local/ [root@xxx01 local]# mv azkaban-solo-server-0.1.0-SNAPSHOT/ azkaban-solo
2) Configure environment variables
[root@xxx01 ~]# vim /etc/profile #azkaban environment export AZKABAN_HOME=/usr/local/azkaban-solo export PATH=$AZKABAN_HOME/bin:$PATH [root@xxx01 ~]# source /etc/profile
3) Add user
[root@xxx01 ~]# vim $AZKABAN_HOME/conf/azkaban-users.xml Add the following on the fourth line: <user password="admin" roles="metrics,admin" username="admin"/>
So far, the solo mode has been successfully installed
4) Start azkaban: Note: you must run the startup script at azkaban's home
[root@xxx01 azkaban-solo]# ./bin/start-solo.sh
5) Open browser
input ip:8081 If it can be opened, the installation is successful
3.2 installation method of multi exec server
3.2. 1 node layout
xxx01 webserver xxx02 execserver xxx03 execserver
3.2. 2. Configure mysql
Step 1) find create-all-sql-0.1 0-SNAPSHOT. SQL script
Mode 1: Upload the files in the installation package azkaban-db-0.1.0-SNAPSHOT.tar.gz,stay linux Unzip it and go inside to find it Mode 2: stay windows Unzip the script, and then enter it to find the script
Step 2) enter mysql and create an azkaban library
create database azkaban;
Step 3) execute the script
use azkaban; source /root/create-all-sql-0.1.0-SNAPSHOT.sql
Step 4) ensure that azkaban is authorized remotely
grant all privileges on *.* to root@'%' identified by '@Mmforu45';
Step 5) modify the mysql configuration
(it is recommended to modify it. If an error is reported when restarting the service, do not modify it)
[root@xxx03 azkaban]# vi /etc/my.cnf stay[mysqld]Add next max_allowed_packet=1024M [root@xxx03 ~]# systemctl restart mysqld
3.2. 3. Configure web server
Step 1) upload, unzip and rename
[root@xxx01 ~]# tar -zxvf azkaban-web-server-0.1.0-SNAPSHOT.tar.gz -C /usr/local/ [root@xxx01 ~]# cd /usr/local/ [root@xxx01 local]# mv azkaban-web-server-0.1.0-SNAPSHOT/ azkaban-web
Step 2) configure environment variables (it doesn't matter whether they are configured or not)
Step 3) import the mysql driver package
get into azkaban-web Directory, create extlib Directory and upload mysql Drive of jar Package to extlib Directory [root@xxx01 local]# cd azkaban-web [root@xxx01 azkaban-web]# mkdir extlib
Step 4) generate secret key
[root@qphone01 azkaban-web]# keytool -keystore keystore -alias jetty -genkey -keyalg RSA It is required to specify the keystore instruction and reconfirm the keystore instruction, both of which are 123456 Enter all the way until "is it correct" appears y that will do
Step 5) configure Azkaban properties
# Azkaban Personalization Settings azkaban.name=Test azkaban.label=My Local Azkaban azkaban.color=#FF3601 azkaban.default.servlet.path=/index web.resource.dir=/usr/local/azkaban-web/web default.timezone.id=Asia/Shanghai # Azkaban UserManager class user.manager.class=azkaban.user.XmlUserManager user.manager.xml.file=/usr/local/azkaban-web/conf/azkaban-users.xml # Loader for projects executor.global.properties=/usr/local/azkaban-exec/conf/global.properties azkaban.project.dir=projects # Velocity dev mode velocity.dev.mode=false # Azkaban Jetty server properties. jetty.use.ssl=false jetty.maxThreads=25 jetty.ssl.port=8443 jetty.port=8081 jetty.keystore=keystore jetty.password=123456 jetty.keypassword=123456 jetty.truststore=keystore jetty.trustpassword=123456 # Azkaban Executor settings # mail settings mail.sender= mail.host= # User facing web server configurations used to construct the user facing server URLs. They are useful when there is a reverse proxy between Azkaban web servers and users. # enduser -> myazkabanhost:443 -> proxy -> localhost:8081 # when this parameters set then these parameters are used to generate email links. # if these parameters are not set then jetty.hostname, and jetty.port(if ssl configured jetty.ssl.port) are used. # azkaban.webserver.external_hostname=myazkabanhost.com # azkaban.webserver.external_ssl_port=443 # azkaban.webserver.external_port=8081 job.failure.email= job.success.email= lockdown.create.projects=false cache.directory=cache # JMX stats jetty.connector.stats=true executor.connector.stats=true # Azkaban mysql settings by default. Users should configure their own username and password. database.type=mysql mysql.port=3306 mysql.host=xxx03 mysql.database=azkaban mysql.user=root mysql.password=@Mmforu45 mysql.numconnections=100 #Multiple Executor azkaban.use.multiple.executors=true #azkaban.executorselector.filters=StaticRemainingFlowSize,MinimumFreeMemory,CpuStatus azkaban.executorselector.filters=StaticRemainingFlowSize,CpuStatus azkaban.executorselector.comparator.NumberOfAssignedFlowComparator=1 azkaban.executorselector.comparator.Memory=1 azkaban.executorselector.comparator.LastDispatched=1 azkaban.executorselector.comparator.CpuUsage=1
Step 6) configure Azkaban users xml
Add admin user
<azkaban-users> <user groups="azkaban" password="azkaban" roles="admin" username="azkaban"/> <user password="metrics" roles="metrics" username="metrics"/> <user password="admin" roles="metrics,admin" username="admin"/> <role name="admin" permissions="ADMIN"/> <role name="metrics" permissions="METRICS"/> </azkaban-users>
3.2. 4. Configure exec server
Step 1) upload, unzip and rename
[root@xxx02 ~]# tar -zxvf azkaban-exec-server-0.1.0-SNAPSHOT.tar.gz -C /usr/local/ [root@xxx02 ~]# cd /usr/local/ [root@xxx02 local]# mv azkaban-exec-server-0.1.0-SNAPSHOT/ azkaban-exec
Step 2) enter the Azkaban exec directory, create the extlib directory, and import the mysql driver package into this directory
[root@xxx02 local]# cd azkaban-exec [root@xxx02 azkaban-exec]# mkdir extlib
Step 3) modify Azkaban properties
[root@xxx02 azkaban-exec]# vi conf/azkaban.properties
Modify to the following content (note that the path and password of your machine should match)
# Azkaban Personalization Settings azkaban.name=Test azkaban.label=My Local Azkaban azkaban.color=#FF3601 azkaban.default.servlet.path=/index web.resource.dir=/usr/local/azkaban-web/web default.timezone.id=Asia/Shanghai # Azkaban UserManager class user.manager.class=azkaban.user.XmlUserManager user.manager.xml.file=/usr/local/azkaban-web/conf/azkaban-users.xml # Loader for projects executor.global.properties=/usr/local/azkaban-exec/conf/global.properties azkaban.project.dir=projects # Velocity dev mode velocity.dev.mode=false # Azkaban Jetty server properties. jetty.use.ssl=false jetty.maxThreads=25 jetty.port=8081 # Where the Azkaban web server is located azkaban.webserver.url=http://xxx01:8081 # mail settings mail.sender= mail.host= # User facing web server configurations used to construct the user facing server URLs. They are useful when there is a reverse proxy between Azkaban web servers and users. # enduser -> myazkabanhost:443 -> proxy -> localhost:8081 # when this parameters set then these parameters are used to generate email links. # if these parameters are not set then jetty.hostname, and jetty.port(if ssl configured jetty.ssl.port) are used. # azkaban.webserver.external_hostname=myazkabanhost.com # azkaban.webserver.external_ssl_port=443 # azkaban.webserver.external_port=8081 job.failure.email= job.success.email= lockdown.create.projects=false cache.directory=cache # JMX stats jetty.connector.stats=true executor.connector.stats=true # Azkaban plugin settings azkaban.jobtype.plugin.dir=/usr/local/azkaban-exec/plugins/jobtypes/ # Azkaban mysql settings by default. Users should configure their own username and password. #azkaban.executorselector.filters=StaticRemainingFlowSize,MinimumFreeMemory,CpuStatus azkaban.executorselector.filters=StaticRemainingFlowSize,CpuStatus database.type=mysql mysql.port=3306 mysql.host=xxx03 mysql.database=azkaban mysql.user=root mysql.password=@Mmforu45 mysql.numconnections=100 # Azkaban Executor settings executor.port=12321 executor.maxThreads=50 executor.flow.threads=30
5) Modify plug-in file
[root@xxx02 azkaban-exec]# vi ./plugins/jobtypes/commonprivate.properties set execute-as-user execute.as.user=false memCheck.enabled=false #The add memory check is turned off, otherwise the error is less than 3G
So far, after Azkaban exec is configured, it's almost X03. We can scp to another machine
[root@xxx02 azkaban-exec]# cd .. [root@xxx02 local]# scp -r azkaban-exec xxx03:/usr/local/
6) Start the test (it is recommended to restart the virtual machine first)
zkaban starts in the following order: start the executor first, and then start the web. Otherwise, the web project will fail to start because the executor cannot be found.
Start two exec s first
[root@xxx02 ~]# cd /usr/local/azkaban-exec [root@xxx02 azkaban-exec]# ./bin/start-exec.sh [root@xxx03 ~]# cd /usr/local/azkaban-exec [root@xxx03 azkaban-exec]# ./bin/start-exec.sh
Then look at the metadata table executors
Log in to your mysql see executors Two in the table active Isn't it 1,If not, please change to 1
Then start the web server
[root@xxx01 ~]# cd /usr/local/azkaban-web [root@xxx01 azkaban-web]# ./bin/start-web.sh
Then start webui happily, xxxxx:8081
4, Application of azkaban
So far, azkaban's work flow mechanism is divided into two flow mechanisms, one is an old job flow and the other is a new flow flow.
job flow, called flow1 Version 0, flow flow, called flow2 0 version
4.1,Flow1.0 version of job stream
4.1. 1 Description
1. azkaban of job Stream file, suffix is.job There must be one inside type Attribute must be assigned Values can be: command,java,pig One of 2. azkaban Executive job It must be packaged in advance, and the packaging format must be zip format 3. Writing format in stream file: 1)Be sure that there are no spaces at the end of the line 2)utf-8 Code set, if in window It's really not good. You can upload it to linux conduct zip Compress and download to windows Upload to azkaban upper
4.1. 2 case demonstration 1: print hello world
1) Create a suffix of The file helloworld of job. The input contents are as follows:
type=command command=echo "hello world"
Note: the coding set must be utf-8 in the end
2) Compressed into a zip package
3) Upload to azkaban
1. First create the project 2. Upload to project 3. implement run job 4. After entering the ready interface, click execute,function
be careful:
Gray: indicates not running Green: run through Red: run failed Blue: running
4.1. 3 case demonstration 2 calling shell instruction
1) Write a shell script calculate sh
#!/usr/bin/bash sum=0 for i in $(seq 1 100) do sum=$(( $sum + $i )) done echo $sum >> /root/sum.log
2) Write job file A2 Job, call shell script
type=command command=/usr/bin/bash calculate.sh
3) Package, upload, test
4.1. 4 case demonstration 2 execution mr procedure
1) Write job file A3 job
type=command command=/usr/local/hadoop/bin/hadoop jar hadoop-mapreduce-examples-2.6.0-cdh5.13.2.jar wordcount /input /output
2) Hadoop-mapreduce-examples-2.6 0-cdh5. 13.2. Jar and download to and A3 Same location as job
3) Package, upload, test
Note: hdfs and yarn should be started, whether the input directory exists, and upload the files to be counted
4.1. 5 case demonstration 4 workflow demonstration
1) Create b.sh
#!/bin/bash echo hello_bbb >/root/b.log sleep 30s
2) Create jobb job
type=command command=/bin/bash b.sh
3) Create a.sh
#!/bin/bash echo hello_aaa >/root/a.log
4) Create Joba job
type=command dependencies=jobB command=/bin/bash a.sh
5) Package, upload, test
4.1.6 azkaban scheduling hive script
1) Create a HQL script: create_table.hql
create database mydb3; use mydb3; create table if not exists test1( sid int, sname string ) row format delimited fields terminated by ',';
2) Create a job file: create_table.job
type=command command=/usr/local/hive/bin/beeline -u jdbc:hive2://qianfeng02:10000 -n root -f create_table.hql
Note: you need to open the hiveserver2 service item on qianfeng02.
3) Package, upload and execute, and then view it
4.1.7 azkaban scheduled scheduling tasks
1) Create an SH script: testfront sh
#!/bin/bash echo "aaaaa" >>/root/crond.log
2) Create a job file: testfront job
type=command command=/usr/bin/bash testcrond.sh
3) Package and upload to azkaban
4) Click run job or execute flow to enter the interface. Instead of clicking execute immediately, click schedule to set scheduled tasks
5) After setting, click the schedule button under the scheduled task and continue to click execute
6) After entering the new interface, you need to click the job name in the FLOW column to enter the execution plan interface
7) Then click Schedule/execute Flow to enter the final interface and click execute
4.2, flow2. flow stream of version 0
Azkaban Currently supported at the same time Flow 1.0 and Flow2.0 ,But it is more recommended in official documents Flow 2.0,because Flow 1.0 Will be removed in future versions. Flow 2.0 The main design idea is to provide 1.0 There is no stream level definition. The user can assign all the data belonging to a given stream job / properties The file is merged into a single stream definition file, and its content adopts YAML Syntax. At the same time, it also supports redefinition of streams in streams, which are called embedded streams or sub streams.
4.2. 1 basic structure
The project zip will contain multiple stream YAML files, a project YAML file, and optional libraries and source code. The basic structure of Flow YAML file is as follows:
1. be-all workflow It's all written in one file 2. Files with stream name as suffix, such as: my-flow-name.flow; 3. contain DAG All nodes in the; 4. Each node can be of different types, such as flow,hive,hadoopjava,pig,noop,command 5. Each node can have name, type, config, dependsOn and nodes sections Other attributes; 6. By listing dependsOn Specify dependencies 7. Contains additional configurations related to the flow 8. flow1.0 The properties in are migrated to config Next, config The following is written in the form of key value pairs. Note: you need to write a separate one xxxx.project File assignment azkaban Using workflow2.0 edition azkaban-flow-version: 2.0
4.3 YAML syntax
To use Flow 2.0 for workflow configuration, you first need to understand YAML. YAML is a concise non markup language with strict format requirements. If your format configuration fails, parsing exceptions will be thrown when uploading to Azkaban.
4.3. 1 basic rules
1. Case sensitive 2. Use indentation to represent hierarchical relationships; three. There is no limit on the indent length. As long as the elements are aligned, it means that these elements belong to a level; four. use#Indicates a comment; 5. By default, single and double quotation marks are not added to the string, but both single quotation marks and double quotation marks can be used. Double quotation marks mean that there is no need to escape special characters; 6. YAML provides a variety of constant structures, including integer, floating point number, string, NULL, date, Boolean and time.
4.3. 2. Object writing
# There must be a space between value and: symbols key: value
4.3. 3. How to write map:
# All key value pairs written in the same indent belong to a map key: key1: value1 key2: value2 # Writing method 2 {key1: value1, key2: value2}
4.3. 4. Writing of array
# Write 1. Use a dash plus a space to represent an array item - a - b - c # Writing method 2 [a,b,c]
4.3. 5 single and double quotation marks
s1: 'content\n character string' s2: "content\n character string" After conversion: { s1: 'content\\n character string', s2: "content\n character string" }
4.3. 6 special symbols
One YAML Multiple documents can be included in the file `---` Split.
4.3. 7 configuration reference
Flow 2.0 It is recommended that common parameters be defined in `config` Down and through `${}` Reference.
4.4 case introduction
4.4. 1 simple case scheduling
1) Write a XXXX Flow files, such as simple Flow (pay attention to character set, TAB key, etc.)
nodes: - name: jobA type: command config: command: echo "this is a simple test"
2) Prepared version file: XXX Project, such as the same project
azkaban-flow-version: 2.0
3) Package into XXX Zip file, upload, test
4.4. 2 multitask scheduling
1) Write a XXXX Flow files, such as multi Flow (pay attention to character set, TAB key, etc.)
nodes: - name: jobE type: command config: command: echo "This is job E" # jobE depends on jobD dependsOn: - jobD - name: jobD type: command config: command: echo "This is job D" # jobD depends on jobA,jobB,jobC dependsOn: - jobA - jobB - jobC - name: jobA type: command config: command: echo "This is job A" - name: jobB type: command config: command: echo "This is job B" - name: jobC type: command config: command: echo "This is job C"
2) Prepared version file: XXX Project, such as the same project
azkaban-flow-version: 2.0
3) Package into XXX Zip file, upload, test
4.4. 3 embedded flow scheduling
1) Write a XXXX Flow files, such as embedded Flow (pay attention to character set, TAB key, etc.)
nodes: - name: jobC type: command config: command: echo "This is job C" dependsOn: - embedded_flow - name: embedded_flow type: flow config: prop: value nodes: - name: jobB type: command config: command: echo "This is job B ${prop}" dependsOn: - jobA - name: jobA type: command config: command: echo "This is job A"
2) Prepared version file: XXX Project, such as the same project
azkaban-flow-version: 2.0
3) Package into XXX Zip file, upload, test
5, Mailbox alert for azkaban
1) Register a mailbox
Suggestions are Sina, Netease, etc
2) Open the third-party client protocol pop3/smtp/imap of the mailbox
You need to send a text message from your mobile phone to open it. You need to remember a password. It can be backed up to the computer to prevent forgetting.
3) Configuration of azkaban as a client: conf / azkaban properties
mail.sender=Your mailbox mail.host=smtp.sina.cn mail.user=Your mailbox mail.password=open pop3/smtp/imap Password at The following two attributes can be matched or not compensated. In azkaban3.0 Invalid after version job.failure.email=mmforu@sina.cn job.success.email=mmforu@sina.cn
4) Restart azkaban's service
5) Case test
1. Upload a case 2. Enter the execution interface and click Notification 3. Configure mailboxes to notify on failure and success 4. implement
6, azkaban's telephone alarm
1) Register Ruixiang cloud account, preferably email authentication
2) Enter the integration interface in the CA navigation and select email
3) Add corresponding information, such as application name and mailbox (mailbox of Ruixiang cloud), and then click the get AppKey button
4) Click the notification policy in configuration, configure the corresponding status information, and click save
5) Then check the generated mailbox of Ruixiang cloud and copy it
6) Test azkaban
1. Upload a case 2. Enter the execution interface and click Notification 3. Mailbox notified when configuration fails and succeeds: mailbox of Ruixiang cloud 4. implement
(>... <, there is one day left from the three-day holiday on New Year's Day ~, chongchong ~)