1. Operator和task
简单来说,Operator就是task的抽象类
2. BaseOperator
所有的功能性Operator的来源
2.1 参数:
1 | task_id (string) :唯一标识task的id |
2.2 方法
1 | set_downstream(task_or_task_list) |
2.3 可调用值
1 | dag:如果设置了则返回dag,否则报错 |
3. BaseSensorOperator
基于 airflow.models.BaseOperator, airflow.models.SkipMixin
3.1 参数:
1 | soft_fail (bool):设置为true以将任务标记为失败时的skipped |
3.2 方法:
1 | execute(context) |
4. Core Operators
4.1 airflow.sensors.base_sensor_operator.BaseSensorOperator(poke_interval=60, timeout=604800, soft_fail=False, *args, **kwargs)
基于 airflow.models.BaseOperator
4.1.1 参数:
1 | bash_command (string):bash命令 |
4.1.2 方法:
1 | execute(context) |
4.2 airflow.operators.bash_operator.BashOperator(bash_command, xcom_push=False, env=None, output_encoding=’utf-8’, *args, **kwargs)
基于 airflow.operators.python_operator.PythonOperator, airflow.models.SkipMixin
4.3 airflow.operators.python_operator.BranchPythonOperator(python_callable, op_args=None, op_kwargs=None, provide_context=False, templates_dict=None, templates_exts=None, *args, **kwargs)
基于 airflow.operators.python_operator.PythonOperator, airflow.models.SkipMixin
4.4 airflow.operators.check_operator.CheckOperator(sql, conn_id=None, *args, **kwargs)
基于 airflow.models.BaseOperator
CheckOperator需要返回一个单行的sql查询,第一行的每个值都需要使用python bool cast进行计算,如果任何返回值false,则检查失败并输出错误
4.4.1 注意:
1)python bool cast 会将一下视作false:
False
0
Empty string (“”)
Empty list ([])
Empty dictionary or set ({})
2)这是一个抽象类,需要定义get_db_hook,而get_db_hook是钩子,从外部获取单个记录
4.4.2 参数:
1 | sql (string):执行的sql语句 |
4.5 airflow.operators.email_operator.EmailOperator(to, subject, html_content, files=None, cc=None, bcc=None, mime_subtype=’mixed’, mime_charset=’us_ascii’, *args, **kwargs)
基于 airflow.models.BaseOperator
功能:发送一封邮件
4.5.1 参数:
1 | to (list or string (comma or semicolon delimited)):发送电子邮件的邮件列表 |
4.6 airflow.operators.mysql_operator.MySqlOperator(sql, mysql_conn_id=’mysql_default’, parameters=None, autocommit=False, database=None, *args, **kwargs)
基于 airflow.models.BaseOperator
功能:在指定的mysql数据库中执行sql代码
4.6.1 参数:
1 | mysql_conn_id (string):数据库名 |
4.7 airflow.operators.presto_check_operator.PrestoValueCheckOperator(sql, pass_value, tolerance=None, presto_conn_id=’presto_default’, *args, **kwargs)
基于:airflow.models.BaseOperator
4.7.1 参数:
1 | python_callable (python callable):可调用对象的引用 |
4.8 airflow.operators.python_operator.PythonVirtualenvOperator(python_callable, requirements=None, python_version=None, use_dill=False, system_site_packages=True, op_args=None, op_kwargs=None, string_args=None, templates_dict=None, templates_exts=None, *args, **kwargs)
###4.9 airflow.operators.python_operator.ShortCircuitOperator(python_callable, op_args=None, op_kwargs=None, provide_context=False, templates_dict=None, templates_exts=None, *args, **kwargs)
基于:airflow.operators.python_operator.PythonOperator, airflow.models.SkipMixin
功能:仅在满足条件时才允许工作流继续。 否则,将跳过工作流程“短路”和下游任务。
派生自PythonOperator,条件结果由python_callable决定
4.10 airflow.sensors.http_sensor.HttpSensor(endpoint, http_conn_id=’http_default’, method=’GET’, request_params=None, headers=None, response_check=None, extra_options=None, *args, **kwargs)
基于:airflow.sensors.base_sensor_operator.BaseSensorOperator
4.10.1 参数:
1 | http_conn_id (string):连接 |
4.11 airflow.sensors.sql_sensor.SqlSensor(conn_id, sql, *args, **kwargs)
基于:airflow.sensors.base_sensor_operator.BaseSensorOperator
功能:运行sql语句,直到满足条件。 它将继续尝试,而sql不返回任何行,或者如果第一个单元格返回(0,’0’,’’)
4.11.1 参数:
1 | conn_id (string) |
4.12 airflow.sensors.time_sensor.TimeSensor(target_time, *args, **kwargs)
基于:airflow.sensors.base_sensor_operator.BaseSensorOperator
功能:等到当天的指定时间
4.12.1 参数:
1 | target_time (datetime.time) |
其他
1 | 4.13 airflow.sensors.time_delta_sensor.TimeDeltaSensor(delta, *args, **kwargs) |