Those two parameters have a big influence of how scheduling will work, there should be also a small notion about them…
Course: An Introduction to Apache Airflow - Learn Interactively
Lesson: Scheduling
Those two parameters have a big influence of how scheduling will work, there should be also a small notion about them…
Course: An Introduction to Apache Airflow - Learn Interactively
Lesson: Scheduling
Hi @Dmitry_Polovinkin !!
Catchup and depends_on_past are important parameters in Airflow that can influence how scheduling works.
1. Catchup Parameter:
The catchup
parameter is used to specify whether Airflow should backfill the previous scheduling period. By default, the catchup
parameter is set to True
. This means that when a new DAG is created or when the start_date
of a DAG is modified, Airflow will automatically backfill or catch up and run any missing DAG runs for the interval between the new start_date
and the current date.
For instance, if you have a DAG scheduled to run daily and you set the start_date
to a date two weeks ago, with catchup
set to True
, Airflow will schedule and execute the DAG for each of the missed days during those two weeks.
If you set catchup
to False
, Airflow will only schedule and execute the DAG for the interval between the current date and the last scheduled DAG run, ignoring any missed runs before the current date. This is useful when you do not want historical data to be recalculated or reprocessed.
2. depends_on_past Parameter:
The depends_on_past
parameter is used to control the dependency of tasks on their previous runs. When depends_on_past
is set to True
for a task, it implies that the task instance will depend on the previous task instance. If the previous task instance fails, the dependent task will not run.
Conversely, when depends_on_past
is set to False
, the task instance will run regardless of the status of the previous task instance. This can be useful for tasks that are independent of each other and do not rely on the success or failure of the previous task.
For example, if you have a series of tasks that can be executed independently and in parallel, you might set depends_on_past
to False
for each of those tasks.
By understanding and properly configuring these parameters, you can control how Airflow handles the scheduling and execution of tasks within a DAG, as well as how it manages the dependencies between tasks.
happy Learning