Azure数据湖的Azure函数绑定(Python) [英] Azure function binding for Azure data lake (python)

查看:63
本文介绍了Azure数据湖的Azure函数绑定(Python)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个需求,例如我想从Azure函数连接到我的Azure数据湖v2(ADLS),读取文件,使用python(pyspark)处理它,然后再次将其写入Azure数据湖.因此,我的输入和输出绑定将是ADLS.python中是否有用于Azure函数的ADLS绑定?有人可以对此提出任何建议吗?

谢谢,安藤D

解决方案

更新:

1,当我们读取数据时,可以使用blob输入绑定.

2,但是当我们写数据时,我们不能使用blob输出绑定.(这是因为对象不同.)而且azure函数不支持ADLS输出绑定,因此我们需要将逻辑代码放在主体中我们要编写代码时的功能.

这是azure函数可以支持的绑定类型的文档:

https://docs.microsoft.com/zh-CN/azure/azure-functions/functions-triggers-bindings?tabs = csharp#supported-bindings

下面是一个简单的代码示例:

 导入日志记录将azure.functions导入为func从azure.storage.filedatalake导入DataLakeServiceClientdef main(req:func.HttpRequest,inputblob:func.InputStream)->func.HttpResponse:connect_str ="DefaultEndpointsProtocol = https; AccountName = 0730bowmanwindow; AccountKey = xxxxxx; EndpointSuffix = core.windows.net";datalake_service_client = DataLakeServiceClient.from_connection_string(connect_str)myfilesystem =测试"myfile =" FileName.txt"file_system_client = datalake_service_client.get_file_system_client(myfilesystem)file_client = file_system_client.create_file(myfile)inputstr = inputblob.read().decode("utf-8")print(数据长度为" + str(len(inputstr)))filesize_previous = 0print(当前文件的长度为+ str(filesize_previous))file_client.append_data(inputstr,offset = filesize_previous,length = len(inputstr))file_client.flush_data(filesize_previous + len(inputstr))返回func.HttpResponse(这是一个测试." + inputstr,status_code = 200) 

原始答案:

我认为以下文档将为您提供帮助:

阅读方法:

https://docs.microsoft.com/zh-CN/azure/azure-functions/functions-bindings-storage-blob-input?tabs = csharp

怎么写:

https://docs.microsoft.com/zh-cn/python/api/azure-storage-file-datalake/azure.storage.filedatalake.datalakeserviceclient?view=azure-python

顺便说一句,不要使用blob的输出绑定.可以通过绑定实现读取,但不能写入.(Blob存储服务和Datalake Service基于不同的对象.尽管使用blob输入绑定来读取文件是完全可以的,但是请不要使用blob输出绑定来写入文件,因为它确实不能基于Datalake Service创建对象.)

让我知道上面的文档是否可以帮助您,否则,我将更新一个简单的python示例.

I am having a requirement like I want to connect to my Azure data lake v2(ADLS) from Azure functions, read file, process it using python(pyspark) and write it again in Azure data lake. So my input and output binding would be to ADLS. Is there any ADLS binding for Azure function in python available? Could somebody give any suggestions on this?

Thank, Anten D

解决方案

Update:

1, When we read the data, we can use blob input binding.

2, But when we write the data, we can not use blob output binding.(This is because the object is different.) And azure function not support ADLS output binding so we need to put the logic code in the body of the function when we want to write the code.

This is the doc of what kind of binding that azure function can support:

https://docs.microsoft.com/en-us/azure/azure-functions/functions-triggers-bindings?tabs=csharp#supported-bindings

Below is a simply code example:

import logging

import azure.functions as func
from azure.storage.filedatalake import DataLakeServiceClient

def main(req: func.HttpRequest, inputblob: func.InputStream) -> func.HttpResponse:
    connect_str = "DefaultEndpointsProtocol=https;AccountName=0730bowmanwindow;AccountKey=xxxxxx;EndpointSuffix=core.windows.net"
    datalake_service_client = DataLakeServiceClient.from_connection_string(connect_str)
    myfilesystem = "test"
    myfile       = "FileName.txt"
    file_system_client = datalake_service_client.get_file_system_client(myfilesystem)    
    file_client = file_system_client.create_file(myfile)
    inputstr = inputblob.read().decode("utf-8")
    print("length of data is "+str(len(inputstr)))
    filesize_previous = 0
    print("length of currentfile is "+str(filesize_previous))
    file_client.append_data(inputstr, offset=filesize_previous, length=len(inputstr))
    file_client.flush_data(filesize_previous+len(inputstr))
    return func.HttpResponse(
            "This is a test."+inputstr,
            status_code=200
    )

Original Answer:

I think below doc will helps you:

How to read:

https://docs.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob-input?tabs=csharp

How to write:

https://docs.microsoft.com/en-us/python/api/azure-storage-file-datalake/azure.storage.filedatalake.datalakeserviceclient?view=azure-python

By the way, don't use blob's output binding. Reading can be achieved with binding, but writing cannot.(Blob Storage Service and Datalake Service are based on different objects. Although using blob input binding to read files is completely fine, please do not use blob output binding to write files, because it does not create an object based on Datalake Service.)

Let me know whether above doc can helps you, if not I will update a simple python example.

这篇关于Azure数据湖的Azure函数绑定(Python)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆