This article focuses on pitfalls, solutions, and my findings with regard to scheduling Python scripts.

Imagine the following case: periodically, you have to execute updating SQLs using ORM models from your codebase with intermediate processing. For example, you want to detect inactive users by side factors like actions they've taken or not taken within a certain time frame once per day and mark them as inactive. This scenario often occurs in data-driven applications where you need to maintain the integrity and accuracy of your data.

Or, a different case: every Monday at 10 AM your script must send internal reports on Slack to every manager about their employees' non-compliant Key Performance Indicators. This scenario is a common requirement in organizations striving to monitor and improve performance across various teams and departments.

What could go wrong with scheduling Python scripts?

So, a lot could go wrong:)

Solutions

Let's look at possible ways to overcome pitfalls in order of their effectiveness.

Schedule Python lib

Schedule is the most basic scheduler for Python and the first one that you'll find if start googling. It looks interesting, but the main disadvantage of this lib is that it's not really designed to be used somewhere.

According to the official documentation:

You should probably look somewhere else if you need:

  • Job persistence (remember schedule between restarts)
  • Exact timing (sub-second precision execution)
  • Concurrent execution (multiple threads)
  • Localization (workdays or holidays)

Schedule does not account for the time it takes for the job function to execute.

Anyway, let's look at an example, pros and cons for further comparison.

import schedule
import time

def job():
    print("I'm working...")

schedule.every().day.at("10:30").do(job)

while True:
    schedule.run_pending()
    time.sleep(1)

Pros

Cons

Cron Unix utility

Cron is the most popular general-purpose scheduler in the world. This scheduler is universal and starts jobs as a shell command. Take note that Cron starts a separate process for every job and this may take a lot of system resources.

Here you need a crontab file:

# m h dom mon dow user  command
*/30 * * * *      /usr/local/bin/python /path/to/the/script.py >> /var/log/cron.log 2>&1

In turn, the script can contain whatever your heart desires:

if __name__ == "__main__":
    print("Whatever your heart desires")

Pros

Cons

Regta Python utility

Regta is a scheduling tool designed with these pitfalls in mind especially for Python. The key advantage is that it has async, multithreading, and multiprocessing support just like restart tolerance.

from regta import async_job, Period

@async_job(Period().on.sunday.at("18:35").by("Asia/Almaty"))
async def my_async_job():
    pass  # Do some stuff here

To run it use regta run command.

Pros

Cons

Summary

When it comes to scheduling Python programs, the first option, Schedule, may not be the most suitable choice for solving real-world problems. Instead, consider the following points:

For internal automation tasks that involve various programming languages and CLI tools, Cron stands as a robust choice.

If your automation needs are focused on Python exclusively, Regta emerges as a compelling option, offering a wealth of Python-specific optimizations. Give it a try, and feel free to share your thoughts in the comments 🙌

Also published here.