python scheduled task APScheduler

Posted by xymbo on Sat, 02 Oct 2021 01:51:43 +0200

abstract

This paper introduces the most basic usage of APScheduler "start job after a few seconds", explains the differences between the two schedulers BackgroundScheduler and BlockingScheduler, explains how to "let the job start running after start()", and details the problems and solutions in the special case of "job execution time is greater than scheduled scheduling time", It also shows that each job will be scheduled in the form of thread.

Basic timing scheduling

APScheduler is a timed task scheduling framework of python. It can realize tasks similar to crontab type tasks under linux, which is convenient to use. It provides similar task scheduling based on fixed time interval, date and crontab configuration, and can persist tasks or run tasks in daemon mode.

The following is a basic example:

from apscheduler.schedulers.blocking import BlockingScheduler

def job():
print('job 3s')

if name=='main':

sched = BlockingScheduler(timezone=<span class="hljs-string">'MST'</span>)
sched.add_job(job, <span class="hljs-string">'interval'</span>, id=<span class="hljs-string">'3_second_job'</span>, seconds=<span class="hljs-number">3</span>)
sched.start()

It can schedule job() to run every 3s, so the program outputs' job 3s' every 3s. By modifying add_ The parameter seconds of job() can change the interval of task scheduling.

What is the difference between BlockingScheduler and BackgroundScheduler

There are many different types of schedulers in APScheduler. BlockingScheduler and BackgroundScheduler are the two most commonly used schedulers. What's the difference between them? In short, the main difference is that the BlockingScheduler blocks the main thread, while the BackgroundScheduler does not. Therefore, we choose different schedulers in different situations:

BlockingScheduler: calling the start function will block the current thread. Use when the scheduler is the only thing to run in your application (as in the above example).
BackgroundScheduler: the main thread will not block after calling start. Use when you don't run any other framework and want the scheduler to execute in the background of your application.

Here are two examples to more intuitively illustrate the difference between the two.

A real example of BlockingScheduler

from apscheduler.schedulers.blocking import BlockingScheduler
import time

def job():
print('job 3s')

if name=='main':

sched = BlockingScheduler(timezone=<span class="hljs-string">'MST'</span>)
sched.add_job(job, <span class="hljs-string">'interval'</span>, id=<span class="hljs-string">'3_second_job'</span>, seconds=<span class="hljs-number">3</span>)
sched.start()

<span class="hljs-keyword">while</span>(<span class="hljs-keyword">True</span>):
    print(<span class="hljs-string">'main 1s'</span>)
    time.sleep(<span class="hljs-number">1</span>)

Running this program, we get the following output:

job 3s
job 3s
job 3s
job 3s

It can be seen that the BlockingScheduler will block the current thread after calling the start function, so that the while loop in the main program will not be executed.

A real example of BackgroundScheduler

from apscheduler.schedulers.background import BackgroundScheduler
import time

def job():
print('job 3s')

if name=='main':

sched = BackgroundScheduler(timezone=<span class="hljs-string">'MST'</span>)
sched.add_job(job, <span class="hljs-string">'interval'</span>, id=<span class="hljs-string">'3_second_job'</span>, seconds=<span class="hljs-number">3</span>)
sched.start()

<span class="hljs-keyword">while</span>(<span class="hljs-keyword">True</span>):
    print(<span class="hljs-string">'main 1s'</span>)
    time.sleep(<span class="hljs-number">1</span>)

It can be seen that the BackgroundScheduler will not block the current thread after calling the start function, so it can continue to execute the logic of the while loop in the main program.

main 1s
main 1s
main 1s
job 3s
main 1s
main 1s
main 1s
job 3s

From this output, we can also find that job() does not start immediately after the start function is called. Instead, wait for 3s before being scheduled for execution.

How to make a job run after start()

How can the scheduler call the start function, and the job() starts executing immediately?

In fact, APScheduler does not provide a good way to solve this problem, but the simplest way is to run job() before the scheduler start s, as follows

from apscheduler.schedulers.background import BackgroundScheduler
import time

def job():
print('job 3s')

if name=='main':
job()
sched = BackgroundScheduler(timezone='MST')
sched.add_job(job, 'interval', id='3_second_job', seconds=3)
sched.start()

<span class="hljs-keyword">while</span>(<span class="hljs-keyword">True</span>):
    print(<span class="hljs-string">'main 1s'</span>)
    time.sleep(<span class="hljs-number">1</span>)<div class="hljs-button {2}" data-title="copy" data-report-click="{&quot;spm&quot;:&quot;1001.2101.3001.4259&quot;}"></div></code><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li><li style="color: rgb(153, 153, 153);">6</li><li style="color: rgb(153, 153, 153);">7</li><li style="color: rgb(153, 153, 153);">8</li><li style="color: rgb(153, 153, 153);">9</li><li style="color: rgb(153, 153, 153);">10</li><li style="color: rgb(153, 153, 153);">11</li><li style="color: rgb(153, 153, 153);">12</li><li style="color: rgb(153, 153, 153);">13</li><li style="color: rgb(153, 153, 153);">14</li><li style="color: rgb(153, 153, 153);">15</li><li style="color: rgb(153, 153, 153);">16</li></ul></pre>

In this way, the following output can be obtained

job 3s
main 1s
main 1s
main 1s
job 3s
main 1s
main 1s
main 1s

In this way, although it does not absolutely "let the job start running after start()", it can also "run the job at the beginning without waiting for scheduling".

What happens if the job takes too long to execute

What happens if it takes 5s to execute job(), but the scheduler is configured to call job() every 3s? We wrote the following examples:

from apscheduler.schedulers.background import BackgroundScheduler
import time

def job():
print('job 3s')
time.sleep(5)

if name=='main':

sched = BackgroundScheduler(timezone=<span class="hljs-string">'MST'</span>)
sched.add_job(job, <span class="hljs-string">'interval'</span>, id=<span class="hljs-string">'3_second_job'</span>, seconds=<span class="hljs-number">3</span>)
sched.start()

<span class="hljs-keyword">while</span>(<span class="hljs-keyword">True</span>):
    print(<span class="hljs-string">'main 1s'</span>)
    time.sleep(<span class="hljs-number">1</span>)

Running this program, we get the following output:

main 1s
main 1s
main 1s
job 3s
main 1s
main 1s
main 1s
Execution of job "job (trigger: interval[0:00:03], next run at: 2018-05-07 02:44:29 MST)" skipped: maximum number of running instances reached (1)
main 1s
main 1s
main 1s
job 3s
main 1s

It can be seen that when the 3s time arrives, it will not "restart a job thread", but skip the scheduling, wait until the next cycle (wait for 3s), and reschedule the job().

In order to make multiple jobs () run at the same time, we can also configure the scheduler parameter max_instances, as shown in the following example, we allow two jobs () to run simultaneously:

from apscheduler.schedulers.background import BackgroundScheduler
import time

def job():
print('job 3s')
time.sleep(5)

if name=='main':
job_defaults = { 'max_instances': 2 }
sched = BackgroundScheduler(timezone='MST', job_defaults=job_defaults)
sched.add_job(job, 'interval', id='3_second_job', seconds=3)
sched.start()

<span class="hljs-keyword">while</span>(<span class="hljs-keyword">True</span>):
    print(<span class="hljs-string">'main 1s'</span>)
    time.sleep(<span class="hljs-number">1</span>)<div class="hljs-button {2}" data-title="copy" data-report-click="{&quot;spm&quot;:&quot;1001.2101.3001.4259&quot;}"></div></code><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li><li style="color: rgb(153, 153, 153);">6</li><li style="color: rgb(153, 153, 153);">7</li><li style="color: rgb(153, 153, 153);">8</li><li style="color: rgb(153, 153, 153);">9</li><li style="color: rgb(153, 153, 153);">10</li><li style="color: rgb(153, 153, 153);">11</li><li style="color: rgb(153, 153, 153);">12</li><li style="color: rgb(153, 153, 153);">13</li><li style="color: rgb(153, 153, 153);">14</li><li style="color: rgb(153, 153, 153);">15</li><li style="color: rgb(153, 153, 153);">16</li></ul></pre>

After running the program, we get the following output:

main 1s
main 1s
main 1s
job 3s
main 1s
main 1s
main 1s
job 3s
main 1s
main 1s
main 1s
job 3s

How is each job scheduled

Through the above example, we find that the scheduler implements scheduling by regularly scheduling the job() function.

Will the job() function be scheduled to run as a process or as a thread?

In order to clarify this problem, we wrote the following procedure:

from apscheduler.schedulers.background import BackgroundScheduler
import time,os,threading

def job():
print('job thread_id-{0}, process_id-{1}'.format(threading.get_ident(), os.getpid()))
time.sleep(50)

if name=='main':
job_defaults = { 'max_instances': 20 }
sched = BackgroundScheduler(timezone='MST', job_defaults=job_defaults)
sched.add_job(job, 'interval', id='3_second_job', seconds=3)
sched.start()

<span class="hljs-keyword">while</span>(<span class="hljs-keyword">True</span>):
    print(<span class="hljs-string">'main 1s'</span>)
    time.sleep(<span class="hljs-number">1</span>)<div class="hljs-button {2}" data-title="copy" data-report-click="{&quot;spm&quot;:&quot;1001.2101.3001.4259&quot;}"></div></code><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li><li style="color: rgb(153, 153, 153);">6</li><li style="color: rgb(153, 153, 153);">7</li><li style="color: rgb(153, 153, 153);">8</li><li style="color: rgb(153, 153, 153);">9</li><li style="color: rgb(153, 153, 153);">10</li><li style="color: rgb(153, 153, 153);">11</li><li style="color: rgb(153, 153, 153);">12</li><li style="color: rgb(153, 153, 153);">13</li><li style="color: rgb(153, 153, 153);">14</li><li style="color: rgb(153, 153, 153);">15</li><li style="color: rgb(153, 153, 153);">16</li></ul></pre>

After running the program, we get the following output:

main 1s
main 1s
main 1s
job thread_id-10644, process_id-8872
main 1s
main 1s
main 1s
job thread_id-3024, process_id-8872
main 1s
main 1s
main 1s
job thread_id-6728, process_id-8872
main 1s
main 1s
main 1s
job thread_id-11716, process_id-8872

It can be seen that the process ID of each job() is the same, but the thread ID is different. Therefore, job() is finally scheduled to execute in the form of thread.

reference resources

Topics: Python

Programmer Think

python scheduled task APScheduler

abstract

Basic timing scheduling

What is the difference between BlockingScheduler and BackgroundScheduler

How to make a job run after start()

What happens if the job takes too long to execute

How is each job scheduled

reference resources

Hot Topics