One - One Code All

Blog Content

airflow.cfg配置参数简介

Python 并行计算   2014-06-06 22:36:32

比较重要的参数:

参数默认值说明
airflow_home/home/airflow/airflow01airflow home,由环境变量$AIRFLOW_HOME决定
dags_folder/home/airflow/airflow01/dagsdag python文件目录
base_log_folder/home/airflow/airflow01/logs主日志目录
executorSequentialExecutor, LocalExecutor, CeleryExecutorexecutor方式,分别为序列、本地、队列
sql_alchemy_connsqlite:////home/airflow/airflow01/airflow.db数据库连接方式
sql_alchemy_pool_size5数据库线程池
sql_alchemy_pool_recycle3600数据库idle连接回收时间
parallelism32executor的并行度,即同时在一个executor上同时运行的task instance个数
dag_concurrency16调度器同时可以运行的task instance个数(跟上个参数啥关系?)
max_active_runs_per_dag16不懂~
load_examplesTrue是否载入示例
default_impersonation
当不设置task的用户时,以此用户执行
security
安全验证类似,如kerberos
default_ownerairflowoperator 绑定的默认用户名
default_cpus1operators 使用的cpu
default_ram512operators 使用的内存
base_urlhttp://localhost:8080webserver的URL
web_server_host0.0.0.0webserver的IP
web_server_port8080webserver的端口
web_server_ssl_cert
webserver的certificate路径
web_server_ssl_key
webserver的key路径
web_server_worker_timeout120gunicorn webserver和worker的超时时间
worker_refresh_batch_size1webserver每次检测worker的个数。发现新的worker并杀死旧的worker
worker_refresh_interval |30webserver检测的时间间隔
access_logfile-webserver日志位置,-指标准输出
error_logfile-webserver日志位置,-指标准输出
expose_configFalse是否在页面上显示配置信息
authenticateFalsewebserver是否开始权限验证
filter_by_ownerFalse通过名称过滤dag,需要开启权限验证。
smtp_hostlocalhostsmtp主机
smtp_user
smtp上的用户
smtp_password
密码
smtp_starttlsTrue使用tls协议
smtp_sslFalse使用ssl协议
smtp_port
smtp端口
smtp_mail_from
发邮件的账户
celeryd_concurrency16
broker_urlsqla+mysql://airflow:airflow@localhost:3306/airflowcelery broker url
celery_result_backenddb+mysql://airflow:airflow@localhost:3306/airflowcelere result
job_heartbeat_sec5Task instances接收外部kill信号(来自cli或者webserver)的时间时隔
scheduler_heartbeat_sec5scheduler和task之间的心跳间隔(应该是跟executor)
scheduler_zombie_task_threshold300检测僵尸task的时间间隔
catchup_by_default
不懂~

parallelism = number of physical python processes the scheduler can run 
dag_concurrency = the number of TIs to be allowed to run PER-dag at once 
max_active_runs_per_dag = number of dag runs (per-DAG) to allow running at once* parallelism = number of physical python processes the scheduler can run

  • dag_concurrency = the number of TIs to be allowed to run PER-dag at once

  • max_active_runs_per_dag = number of dag runs (per-DAG) to allow running at once

其它: 
https://cwiki.apache.org/confluence/display/AIRFLOW/Common+Pitfalls



上一篇:pandas中Series转换为DataFrame
下一篇:pandas中DataFrame数据合并merge方法,inner交集,outer合集,left左连接

The minute you think of giving up, think of the reason why you held on so long.