APScheduler of python third-party module - scheduled task

Posted by magic123 on Sat, 19 Feb 2022 13:23:48 +0100

introduce

The full name of APScheduler is Advanced Python Scheduler. It is a lightweight Python timing task scheduling framework. APScheduler supports three scheduling tasks: fixed time interval, fixed time point (date), and cronab command under Linux. At the same time, it also supports asynchronous execution and background execution of scheduling tasks.

APScheduler is based on a Python timing task framework of Quartz, which realizes all functions of Quartz and is very convenient to use.

install

pip install apscheduler

Official address

https://apscheduler.readthedocs.io/en/latest/userguide.html#starting-the-scheduler

Basic concepts

1. Four components of apscheduler

  • Trigger triggers: used to set the conditions for triggering tasks

  • job stores: used to store tasks in memory or database

  • executors: used to execute tasks. You can set the execution mode to single thread or thread pool

  • Scheduler schedulers: take the above three components as parameters and run by creating a scheduler instance

1.1 trigger triggers

Triggers contain scheduling logic. Each task has its own trigger to determine when the job should run. In addition to the initial configuration, the trigger is completely stateless.

1.2 job stores

By default, tasks are stored in memory. It can also be configured to be stored in different types of databases. If the task is stored in the database, the access of the task has a process of serialization and deserialization. At the same time, the function of modifying and searching the task is also realized by the task memory.

Note that one task memory should not be shared with multiple schedulers, otherwise it will lead to state confusion

1.3 actuators

The task will be put into the thread pool or process pool by the executor to execute. After execution, the executor will notify the scheduler.

1.4 scheduler

A scheduler consists of the above three components. Generally speaking, a program only needs one scheduler. Developers also do not need to directly operate the task memory, executor and trigger, because the scheduler provides a unified interface, and components can be operated through the scheduler, such as adding, deleting, modifying and querying tasks.

Scheduler workflow:

2. Detailed explanation of scheduler components

Select the corresponding components according to the development requirements. The following are different scheduler components:

  • Blocking scheduler: it is suitable for programs that only run the scheduler.
  • BackgroundScheduler background scheduler: applicable to non blocking situations. The scheduler will run independently in the background.
  • Asynchioscheduler asynchio scheduler is applicable to applications using asynchio.
  • GeventScheduler Gevent scheduler, which is applicable to applications passing through Gevent.
  • Tornado scheduler, which is suitable for building tornado applications.
  • Twistedscheduler twistedscheduler, suitable for building Twisted applications.
  • QtScheduler Qt scheduler, suitable for building Qt applications.

2.1 selection of task memory

It depends on whether the task needs persistence. If the task you are running is stateless, select the default task store MemoryJobStore to cope with it. However, if you need to save the state of the task when the program is closed or restarted, you should choose a persistent task store. If so, the author recommends using SQLAlchemyJobStore with PostgreSQL as the background database. This scheme can provide powerful data integration and protection functions.

2.2 selection of actuator

It also depends on your actual needs. The default ThreadPoolExecutor thread pool executor scheme can meet most requirements. If your program is computationally intensive, you'd better use the ProcessPoolExecutor process pool executor scheme to make full use of multiple accounting forces. You can also use ProcessPoolExecutor as the second executor and mix two different executors.

To configure a task, you need to set a task trigger. Trigger can set the cycle, times and time of task running.

3. APScheduler has three built-in triggers

  • Date: the specific date that triggers the task to run
  • Interval: the interval that triggers the task to run
  • cron cycle: the cycle that triggers the task to run
  • Calendar interval: used when you want to run tasks at calendar based intervals at specific times of the day

A task can also set multiple triggers. For example, it can be triggered when all trigger conditions are met at the same time, or it can be triggered when one item is met.

3.0 trigger code example

date is the most basic scheduling, and the job task will be executed only once. It represents a specific point in time trigger. Its parameters are as follows:

  • run_date(datetime or str): the date or time when the task runs
  • timezone(datetime.tzinfo or str): Specifies the time zone
from datetime import date
from apscheduler.schedulers.blocking import BlockingScheduler


scheduler = BlockingScheduler()
def my_job(text):
    print(text)

# Note: run_ The date parameter can be of type date, datetime, or text.
# Implemented on April 15, 2019
scheduler.add_job(my_job, 'date', run_date=date(2019, 4, 15), args=['Test task'])
# datetime type (for exact time)
# scheduler.add_job(my_job, 'date', run_date=datetime(2019, 4, 15, 17, 30, 5), args = ['test task'])
# character string
#scheduler.add_job(my_job, 'date', run_date='2009-11-06 16:30:05', args = [' test task '])

scheduler.start()

3.2 interval cycle trigger task

Triggered at fixed time intervals. interval scheduling. The parameters are as follows:

  • weeks(int): weeks apart
  • days(int): days apart
  • hours(int): hours apart
  • minutes(int): minutes apart
  • seconds(int): how many seconds is the interval
  • start_date(datetime or str): start date
  • end_ Stror (date): end date
  • timezone(datetime.tzinfo or str): time zone
from datetime import datetime
from apscheduler.schedulers.blocking import BlockingScheduler

def job_func():
     print("Current time:", datetime.datetime.utcnow().strftime("%Y-%m-%d %H:%M:%S.%f")

scheduler = BlockingScheduler()

# Triggered every 2 hours
scheduler.add_job(job_func, 'interval', hours=2)

# Execute the job every two minutes from 17:00:00 on April 15, 2019 to 24:00:00 on December 31, 2019_ Func method
scheduler .add_job(job_func, 'interval', minutes=2, start_date='2019-04-15 17:00:00' , end_date='2019-12-31 24:00:00')

# jitter vibration parameter, which adds a random floating number of seconds to each trigger. It is generally applicable to multiple servers to avoid service congestion caused by simultaneous operation.
scheduler.add_job(job_func, 'interval', hours=1, jitter=120)

scheduler.start()

3.3 cron trigger

Triggered periodically at a specific time, compatible with the Linux crontab format. It is the most powerful trigger.

  • year(int or str), 4 digits
  • month(int or str) (range 1-12)
  • day(int or str) (range 1-31)
  • week(int or str) weeks (range 1-53)
  • day_of_week(int or str) the day or days of the week (range 0-6 or mon,tue,wed,thu,fri,stat,sun)
  • hour(int or str) (0-23)
  • minute(int or str) (0-59)
  • second(int or str) second (0-59)
  • start_date(datetime or str) earliest start date (inclusive)
  • end_date(datetime or str) latest end date (inclusive)
  • timezone(datetime.tzinfo or str) specifies the time zone

Expression type

expressionParameter typedescribe
*AllWildcards. For example, minutes = * is triggered every minute
*/aAllWildcards divisible by a.
a-bAllRange a-b trigger
a-b/cAllTriggered when the range is a-b and can be divided by c
xth ydayThe day of the week triggers. x is the day of the week and y is the day of the week
last xdayLast week, last month
lastdayTriggered on the last day of the month
x,y,zAllCombining expressions, you can combine the expression above the determined value or

Note: month and day_ of_ The week parameter accepts the English abbreviations jan – dec and mon – sun respectively

import datetime
from apscheduler.schedulers.background import BackgroundScheduler

def job_func(text):
    print("Current time:", datetime.datetime.utcnow().strftime("%Y-%m-%d %H:%M:%S.%f")[:-3])

scheduler = BackgroundScheduler()
# Execute jobs at 00:00, 01:00, 02:00 and 03:00 on Mondays and Tuesdays from January to March and July to September every year_ Func task
scheduler.add_job(job_func, 'cron', month='1-3,7-9',day='0, tue', hour='0-3')

scheduler.start()

Using scheduled_ Add job decorator:

@scheduler.scheduled_job('cron', id='my_job_id', day='last sun')
def some_decorated_task():
    print("I am printed at 00:00:00 on the last Sunday of every month!")

Note: daylight saving time problem

Some timezone time zones may have problems with daylight saving time. This may cause the task not to be executed or the task to be executed twice when the command time is switched. To avoid this problem, you can use UTC time or predict and plan the implementation problems in advance.

pri# In the European / Helsinki time zone, it will not be triggered on the last Monday of March; It will be triggered twice on the last Monday of October
scheduler.add_job(job_function, 'cron', hour=3, minute=30)

4. Configure scheduler

The APScheduler provides many different ways to configure the scheduler. You can use the configuration dictionary or pass options as keyword parameters. You can also instantiate the scheduler first, then add tasks and configure the scheduler. This gives you maximum flexibility in any environment

A complete list of scheduler level configuration options can be found in the API reference of the BaseScheduler class. Scheduler subclasses can also have other options recorded in their respective API references. The configuration options of each task storage and execution program can also be found on its API reference page.

Suppose you want to run the BackgroundScheduler in your application using the default job store and default executor:

from apscheduler.schedulers.background import BackgroundScheduler


scheduler = BackgroundScheduler()

This will provide you with a BackgroundScheduler whose MemoryJobStore is named "default", ThreadPoolExecutor is named "default", and the default maximum number of threads is 10.

If you have such a demand now, two task memories are matched with two actuators respectively; At the same time, modify the default parameters of the task; Finally, change the time zone. You can refer to the following examples. They are completely equivalent.

  • MongoDBJobStore named "mongo"
  • SQLAlchemyJobStore with name "default"
  • ThreadPoolExecutor named "ThreadPoolExecutor", with a maximum of 20 threads
  • ProcessPoolExecutor with the name "processpool", with a maximum of 5 processes
  • UTC time is the time zone of the scheduler
  • The default is to turn off merge mode for new tasks ()
  • Set the default maximum number of instances for new tasks to 3

Method 1:

from pytz import utc

from apscheduler.schedulers.background import BackgroundScheduler
from apscheduler.jobstores.mongodb import MongoDBJobStore
from apscheduler.jobstores.sqlalchemy import SQLAlchemyJobStore
from apscheduler.executors.pool import ThreadPoolExecutor, ProcessPoolExecutor


jobstores = {
    'mongo': MongoDBJobStore(),
    'default': SQLAlchemyJobStore(url='sqlite:///jobs.sqlite')
}
executors = {
    'default': ThreadPoolExecutor(20),
    'processpool': ProcessPoolExecutor(5)
}
job_defaults = {
    'coalesce': False,
    'max_instances': 3
}
scheduler = BackgroundScheduler(jobstores=jobstores, executors=executors, job_defaults=job_defaults, timezone=utc)

Method 2:

from apscheduler.schedulers.background import BackgroundScheduler


# The "apscheduler." prefix is hard coded
scheduler = BackgroundScheduler({
    'apscheduler.jobstores.mongo': {
         'type': 'mongodb'
    },
    'apscheduler.jobstores.default': {
        'type': 'sqlalchemy',
        'url': 'sqlite:///jobs.sqlite'
    },
    'apscheduler.executors.default': {
        'class': 'apscheduler.executors.pool:ThreadPoolExecutor',
        'max_workers': '20'
    },
    'apscheduler.executors.processpool': {
        'type': 'processpool',
        'max_workers': '5'
    },
    'apscheduler.job_defaults.coalesce': 'false',
    'apscheduler.job_defaults.max_instances': '3',
    'apscheduler.timezone': 'UTC',
})

Method 3:

from pytz import utc

from apscheduler.schedulers.background import BackgroundScheduler
from apscheduler.jobstores.sqlalchemy import SQLAlchemyJobStore
from apscheduler.executors.pool import ProcessPoolExecutor


jobstores = {
    'mongo': {'type': 'mongodb'},
    'default': SQLAlchemyJobStore(url='sqlite:///jobs.sqlite')
}
executors = {
    'default': {'type': 'threadpool', 'max_workers': 20},
    'processpool': ProcessPoolExecutor(max_workers=5)
}
job_defaults = {
    'coalesce': False,
    'max_instances': 3
}
scheduler = BackgroundScheduler()

# .. You can add tasks here

scheduler.configure(jobstores=jobstores, executors=executors, job_defaults=job_defaults, timezone=utc)

Start scheduler

To start the scheduler, simply call start(). In addition to the blocking scheduler, the non blocking scheduler will return immediately and can continue to run the subsequent code, such as adding tasks.

For BlockingScheduler, the program will block at the start() position, so the code to run must be written before start().

Note: after the scheduler is started, the configuration cannot be modified.

5. Add task

There are two ways to add tasks:

  1. By calling add_job()
  2. Scheduled via decorator_ job()

5.1 advantages and disadvantages:

  • The first method is the most commonly used; The second method is the most convenient, but the disadvantage is that the task cannot be modified at run time.
  • First add_ The job () method returns an apscheduler job. Job instance, so that tasks can be modified or deleted at run time.

You can configure tasks at any time. However, if the scheduler is not started and a task is added at this time, the task is in a temporary state. The next run time is calculated only when the scheduler starts.

It should also be noted that if your executor or task store will serialize tasks, these tasks must comply with:

  • Callback functions must be globally available
  • Callback function parameters must also be serializable

Important reminder!

If the task is read from the database when the program is initialized, you must define an explicit ID for each task and use replace_existing=True, otherwise you will get a new copy of the task every time you restart the program, which means that the state of the task will not be saved.

In the built-in task store, only MemoryJobStore will not serialize tasks; Among the built-in executors, only ProcessPoolExecutor serializes tasks.

Suggestion: if you want to run the task immediately, you can omit the trigger parameter when adding the task

6. Remove task

If you want to remove a task from the scheduler, you need to remove it from the corresponding task store. There are two ways:

  • Call remove_job(), parameters: task ID, task memory name
  • Through add_ The remove() method is called on the task instance created by job ()

The second method is more convenient, but only if the instance is saved in a variable when the task instance is created. For scheduled_ Only the first method can be selected for the task created by job().

When the task scheduling ends (for example, the trigger of a task no longer generates the next running time), the task will be removed automatically.

job = scheduler.add_job(myfunc, 'interval', minutes=2)
job.remove()

# Similarly, through the specific ID of the task:
scheduler.add_job(myfunc, 'interval', minutes=2, id='my_job_id')
scheduler.remove_job('my_job_id')

7. Suspension and resumption of tasks

Through the task instance or scheduler, tasks can be suspended and resumed. If a task is suspended, the next run time of the task will be removed. Before resuming the task, the run count will not be counted.

There are two ways to pause a task:

  • apscheduler.job.Job.pause()
  • apscheduler.schedulers.base.BaseScheduler.pause_job()

Recovery task

  • apscheduler.job.Job.resume()
  • apscheduler.schedulers.base.BaseScheduler.resume_job()

8. Get task list

Through get_jobs() can get a list of modifiable tasks. get_ The second parameter of jobs () can specify the name of the task store, and the task list of the corresponding task store will be obtained.

print_jobs() can quickly print the formatted task list, including trigger, next run time and other information.

Modify task

Via apscheduler job. Job. Modify() or modify_job(), you can modify any attribute of the task except id.

For example:

job.modify(max_instances=6, name='Alternate name')

If you want to reschedule tasks (that is, change triggers), you can use apscheduler job. Job. Reschedule() or reschedule_job(). These methods recreate the trigger and recalculate the next run time.

For example:

scheduler.reschedule_job('my_job_id', trigger='cron', minute='*/5')

9. Turn off the scheduler

scheduler.shutdown()

By default, the scheduler will process the tasks being executed first, and then close the task storage and executor. However, if you close it directly, you can add parameters:

scheduler.shutdown(wait=False)

The above method will force the scheduler to shut down regardless of whether there are tasks executing or not.

10. Suspend and resume the task process

# Pause the task in progress
scheduler.pause()


# Recovery task:
scheduler.resume()


# You can also set all tasks to be suspended by default when the scheduler starts.
scheduler.start(paused=True)

Topics: Python