educative.io

Catchup and depends_on_past parameters should be included in this lesson

Those two parameters have a big influence of how scheduling will work, there should be also a small notion about them…


Course: An Introduction to Apache Airflow - Learn Interactively
Lesson: Scheduling

Hi @Dmitry_Polovinkin !!
Catchup and depends_on_past are important parameters in Airflow that can influence how scheduling works.

1. Catchup Parameter:

The catchup parameter is used to specify whether Airflow should backfill the previous scheduling period. By default, the catchup parameter is set to True. This means that when a new DAG is created or when the start_date of a DAG is modified, Airflow will automatically backfill or catch up and run any missing DAG runs for the interval between the new start_date and the current date.

For instance, if you have a DAG scheduled to run daily and you set the start_date to a date two weeks ago, with catchup set to True, Airflow will schedule and execute the DAG for each of the missed days during those two weeks.

If you set catchup to False, Airflow will only schedule and execute the DAG for the interval between the current date and the last scheduled DAG run, ignoring any missed runs before the current date. This is useful when you do not want historical data to be recalculated or reprocessed.

2. depends_on_past Parameter:

The depends_on_past parameter is used to control the dependency of tasks on their previous runs. When depends_on_past is set to True for a task, it implies that the task instance will depend on the previous task instance. If the previous task instance fails, the dependent task will not run.

Conversely, when depends_on_past is set to False, the task instance will run regardless of the status of the previous task instance. This can be useful for tasks that are independent of each other and do not rely on the success or failure of the previous task.

For example, if you have a series of tasks that can be executed independently and in parallel, you might set depends_on_past to False for each of those tasks.

By understanding and properly configuring these parameters, you can control how Airflow handles the scheduling and execution of tasks within a DAG, as well as how it manages the dependencies between tasks.
happy Learning :blush: