比较重要的参数:
| 参数 | 默认值 | 说明 |
|---|---|---|
| airflow_home | /home/airflow/airflow01 | airflow home,由环境变量$AIRFLOW_HOME决定 |
| dags_folder | /home/airflow/airflow01/dags | dag python文件目录 |
| base_log_folder | /home/airflow/airflow01/logs | 主日志目录 |
| executor | SequentialExecutor, LocalExecutor, CeleryExecutor | executor方式,分别为序列、本地、队列 |
| sql_alchemy_conn | sqlite:////home/airflow/airflow01/airflow.db | 数据库连接方式 |
| sql_alchemy_pool_size | 5 | 数据库线程池 |
| sql_alchemy_pool_recycle | 3600 | 数据库idle连接回收时间 |
| parallelism | 32 | executor的并行度,即同时在一个executor上同时运行的task instance个数 |
| dag_concurrency | 16 | 调度器同时可以运行的task instance个数(跟上个参数啥关系?) |
| max_active_runs_per_dag | 16 | 不懂~ |
| load_examples | True | 是否载入示例 |
| default_impersonation | 当不设置task的用户时,以此用户执行 | |
| security | 安全验证类似,如kerberos | |
| default_owner | airflow | operator 绑定的默认用户名 |
| default_cpus | 1 | operators 使用的cpu |
| default_ram | 512 | operators 使用的内存 |
| base_url | http://localhost:8080 | webserver的URL |
| web_server_host | 0.0.0.0 | webserver的IP |
| web_server_port | 8080 | webserver的端口 |
| web_server_ssl_cert | webserver的certificate路径 | |
| web_server_ssl_key | webserver的key路径 | |
| web_server_worker_timeout | 120 | gunicorn webserver和worker的超时时间 |
| worker_refresh_batch_size | 1 | webserver每次检测worker的个数。发现新的worker并杀死旧的worker |
| worker_refresh_interval |30 | webserver检测的时间间隔 | |
| access_logfile | - | webserver日志位置,-指标准输出 |
| error_logfile | - | webserver日志位置,-指标准输出 |
| expose_config | False | 是否在页面上显示配置信息 |
| authenticate | False | webserver是否开始权限验证 |
| filter_by_owner | False | 通过名称过滤dag,需要开启权限验证。 |
| smtp_host | localhost | smtp主机 |
| smtp_user | smtp上的用户 | |
| smtp_password | 密码 | |
| smtp_starttls | True | 使用tls协议 |
| smtp_ssl | False | 使用ssl协议 |
| smtp_port | smtp端口 | |
| smtp_mail_from | 发邮件的账户 | |
| celeryd_concurrency | 16 | |
| broker_url | sqla+mysql://airflow:airflow@localhost:3306/airflow | celery broker url |
| celery_result_backend | db+mysql://airflow:airflow@localhost:3306/airflow | celere result |
| job_heartbeat_sec | 5 | Task instances接收外部kill信号(来自cli或者webserver)的时间时隔 |
| scheduler_heartbeat_sec | 5 | scheduler和task之间的心跳间隔(应该是跟executor) |
| scheduler_zombie_task_threshold | 300 | 检测僵尸task的时间间隔 |
| catchup_by_default | 不懂~ |
parallelism = number of physical python processes the scheduler can run
dag_concurrency = the number of TIs to be allowed to run PER-dag at once
max_active_runs_per_dag = number of dag runs (per-DAG) to allow running at once* parallelism = number of physical python processes the scheduler can run
dag_concurrency = the number of TIs to be allowed to run PER-dag at once
max_active_runs_per_dag = number of dag runs (per-DAG) to allow running at once
其它:
https://cwiki.apache.org/confluence/display/AIRFLOW/Common+Pitfalls