One - One Code All

Blog Content

airflow backfill用法

Python 并行计算   2014-07-02 13:20:18

在特定情况下,修改DAG后,为了避免当前日期之前任务的运行,可以使用backfill填补特定时间段的任务

airflow backfill -s START -e END --mark_success DAG_ID


官网链接: https://airflow.readthedocs.io/en/latest/cli.html


backfill

Run subsections of a DAG for a specified date range. If reset_dag_run option is used, backfill will first prompt users whether airflow should clear all the previous dag_run and task_instances within the backfill date range. If rerun_failed_tasks is used, backfill will auto re-run the previous failed task instances within the backfill date range.

airflow backfill [-h] [-t TASK_REGEX] [-s START_DATE] [-e END_DATE] [-m] [-l]
                 [-x] [-i] [-I] [-sd SUBDIR] [--pool POOL]
                 [--delay_on_limit DELAY_ON_LIMIT] [-dr] [-v] [-c CONF]
                 [--reset_dagruns] [--rerun_failed_tasks]
                 dag_id

Positional Arguments

dag_idThe id of the dag

Named Arguments

-t, --task_regex

The regex to filter specific task_ids to backfill (optional)
-s, --start_date

Override start_date YYYY-MM-DD
-e, --end_dateOverride end_date YYYY-MM-DD
-m, --mark_success

Mark jobs as succeeded without running them

Default: False

-l, --local

Run the task using the LocalExecutor

Default: False

-x, --donot_pickle

Do not attempt to pickle the DAG object to send over to the workers, just tell the workers to run their version of the code.

Default: False

-i, --ignore_dependencies

Skip upstream tasks, run only the tasks matching the regexp. Only works in conjunction with task_regex

Default: False

-I, --ignore_first_depends_on_past

Ignores depends_on_past dependencies for the first set of tasks only (subsequent executions in the backfill DO respect depends_on_past).

Default: False

-sd, --subdir

File location or directory from which to look for the dag. Defaults to ‘[AIRFLOW_HOME]/dags’ where [AIRFLOW_HOME] is the value you set for ‘AIRFLOW_HOME’ config you set in ‘airflow.cfg’

Default: “[AIRFLOW_HOME]/dags”

--poolResource pool to use
--delay_on_limit

Amount of time in seconds to wait when the limit on maximum active dag runs (max_active_runs) has been reached before trying to execute a dag run again.

Default: 1.0

-dr, --dry_run

Perform a dry run

Default: False

-v, --verbose

Make logging output more verbose

Default: False

-c, --confJSON string that gets pickled into the DagRun’s conf attribute
--reset_dagruns

if set, the backfill will delete existing backfill-related DAG runs and start anew with fresh, running DAG runs

Default: False

--rerun_failed_tasks

if set, the backfill will auto-rerun all the failed tasks for the backfill date range instead of throwing exceptions

Default: False




上一篇:docker保存对容器的修改
下一篇:通过Spring Resource接口获取资源,读取本地资源文件进行文件下载

The minute you think of giving up, think of the reason why you held on so long.