如何在Airflow中运行异步功能? [英] How to run async function in Airflow?

查看:73
本文介绍了如何在Airflow中运行异步功能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个气流任务来读取一个大型的csv并将其保存到postgresql数据库中.我发现此asyncpg软件包具有复制功能,其运行速度比任何其他软件包都快.但是,它是异步的,我不知道如何将其合并到Airflow中.这是示例代码:

I am writing a airflow task to read a large csv and save it to postgresql database. I found this asyncpg package that has a copy function which runs much faster than any other packages. However, it is async, and I don't know how to incorporate it into Airflow. Here is a sample code:

from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from datetime import datetime, timedelta
from pandas import DataFrame
import asyncpg

async def to_sql(dataframe, table_name, schema_name='public', timeout=None, truncate=False):
    connection = await asyncpg.connect(user='postgres', host='host.docker.internal', database='quantaxis', password='123456')
    result = await connection.copy_records_to_table(
        table_name,
        records=dataframe.values.tolist(),
        columns=shared_columns,
        schema_name=schema_name,
        timeout=timeout)
    await connection.close()
    return result


default_args = {
    'owner': 'Airflow',
    'depends_on_past': False,
    'start_date': datetime(2020, 1, 1),
    'retries': 1,
    'retry_delay': timedelta(minutes=1),
}

dag = DAG('pythonexp2123', default_args=default_args, schedule_interval=timedelta(days=1))

async def save_file_to_database(ds):
    df = pd.read_csv("data{0}.csv".format(ds))
    r = await to_sql(df, 'test')
    return r

t1 = PythonOperator(
    task_id='pushing_task',
    provide_context=True,
    python_callable=save_file_to_database,
    dag=dag
    )

t1

当我运行它时,它将返回错误:

When I run it, it will return error:

Can't Pickle Object <Corountine>

如何更改功能以使Dag正常工作?由于其速度,我仍然要使用asyncpg包.

How could I change the function to make this Dag work? I still want to use asyncpg package because of its speed.

推荐答案

您可以尝试使用asyncio在事件循环中运行async函数.如果您使用的是python 3.7>您可以简单地调用asyncio.run(async_function())

You can try running the async function in an eventloop using asyncio. If you are uing python 3.7 > you can simply call asyncio.run(async_function())

https://docs.python.org/3/library/asyncio-task.html

from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from datetime import datetime, timedelta
from pandas import DataFrame
import asyncpg
import asyncio

async def to_sql(dataframe, table_name, schema_name='public', timeout=None, truncate=False):
    connection = await asyncpg.connect(user='postgres', host='host.docker.internal', database='quantaxis', password='123456')
    result = await connection.copy_records_to_table(
        table_name,
        records=dataframe.values.tolist(),
        columns=shared_columns,
        schema_name=schema_name,
        timeout=timeout)
    await connection.close()
    return result



default_args = {
    'owner': 'Airflow',
    'depends_on_past': False,
    'start_date': datetime(2020, 1, 1),
    'retries': 1,
    'retry_delay': timedelta(minutes=1),
}

dag = DAG('pythonexp2123', default_args=default_args, schedule_interval=timedelta(days=1))

async def save_file_to_database(ds):
    df = pd.read_csv("data{0}.csv".format(ds))
    r = await to_sql(df, 'test')
    return r

def run_async(ds):
   loop = asyncio.get_event_loop()
   result = loop.run_until_complete(save_file_to_database(ds))
   return result

t1 = PythonOperator(
    task_id='pushing_task',
    provide_context=True,
    python_callable=run_async,
    dag=dag
    )

t1

这篇关于如何在Airflow中运行异步功能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆