Azure数据工厂获取元数据以获取Blob文件名并将其传输到Azure SQL数据库表 [英] Azure Data Factory Get Metadata to get blob filenames and transfer them to Azure SQL database table

查看:54
本文介绍了Azure数据工厂获取元数据以获取Blob文件名并将其传输到Azure SQL数据库表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在Azure数据工厂中使用获取元数据"活动,以获取blob文件名并将其复制到Azure SQL数据库表中.我遵循此教程:

对于我的复制数据>Sink,其Azure SQL数据库,我启用了选项自动创建表"

在Sink数据集配置中,我不得不选择一个表,因为如果我不选择SQL数据库中的表,即使该表与我想要的Blob文件名根本不相关,验证也不会通过得到.

问题1:我是否应该在SQL DB中创建一个新表,然后使列与要提取的blob文件名匹配?

然后,我尝试验证管道,但出现此错误.

  Copy_Data_1当源是二进制数据集时,接收器必须是二进制的. 

问题2:如何解决此错误?我必须选择源文件的文件类型为二进制,因为这是创建源数据集时的步骤之一.因此,当我选择作为Azure SQL表的接收器数据集时,不必选择数据集的类型,因此看起来似乎不匹配.

非常感谢您.

新管道的新屏幕截图,我现在可以在json输出文件中获取文件名的itemName.

现在,我在Get_File_Name2活动之后添加复制数据"活动,并将它们连接在一起,以尝试获取json输出文件作为源数据集.

但是,我需要先选择源数据集位置,然后再将类型指定为json.但是,据我了解,这些输出json文件是Get_File_Name2活动的输出,它们尚未存储在Blob存储中.如何使复制数据活动读取这些json输出文件作为源数据集?

更新10/14/2020 这是我的新活动存储过程,我按照建议添加了参数,但是我将名称更改为JsonData,因为我的存储过程需要此参数.

这是我的存储过程.

在存储过程中出现此错误:

  {"errorCode":"2402","message":针对sql server执行失败.SQL错误号:13609.错误消息:JSON文本格式不正确.在位置0找到了意外的字符'S'.","failureType":"UserError","target":存储过程1",详细信息":[]} 

但是当我检查输入时,似乎已经成功读取了json字符串itemName.

但是,当我检查输出时,它不存在.

解决方案

实际上,您可以使用获取元数据输出json"作为参数,然后调用存储过程: Get metedata ->存储过程

您只需要关注存储过程的编码即可.

获取元数据输出子项:

  {"childItems":[{名称":"DeploymentFiles.zip",类型":文件"},{名称":"geodatalake.pdf",类型":文件"},{名称":"test2.xlsx",类型":文件"},{名称":"word.csv",类型":文件"}} 

存储过程:

  @activity('Get Metadata1').output.childitems 

关于如何创建存储过程(从json对象获取数据),您可以引用此博客:https://www.mssqltips.com/sqlservertip/6246/azure-data-factory-get-metadata-example/

Here is my pipeline, Copy Data > Source is the source destination of the blob files in my Blob storage. I need to specify my source file as binary because they are *.jpeg files.

For my Copy Data > Sink, its the Azure SQL database, I enable the option "Auto Create table"

In my Sink dataset config, I had to choose one table because the validation won't pass if I don't select the table in my SQL database even though this table is not related at all to the blob filenames that I want to get.

Question 1: Am I supposed to create a new table in SQL DB before to have the columns matching the blob filenames that I want to extract?

Then, I tried to validate the pipeline and I get this error.

Copy_Data_1
Sink must be binary when source is binary dataset.

Question 2: How can I resolve this error? I had to select the file type of the source as binary as it's one of the step when creating source dataset. Therefore, when I choose sink dataset that is Azure SQL table, I didn't have to select the type of dataset so it doesn't seem to match.

Thank you very much in advance.

New screenshot of the new pipeline, I can now get itemName of filenames in the json output files.

Now I add Copy Data activity just after Get_File_Name2 activity and connect them together to try to get the json output files as source dataset.

However, I need to choose the source dataset location first before specify type as json. But, as far as I understand these output json files are the output from Get_File_Name2 activity and they are not yet stored on Blob storage. How do I make the copy data activity reading these json output file as source dataset?

Update 10/14/2020 Here is my new activity stored procedure, I added the parameter as suggested however, I changed the name to JsonData as my stored procedure requires this parameter.

This is my stored procedure.

I get this error at the stored procedure:

{
    "errorCode": "2402",
    "message": "Execution fail against sql server. Sql error number: 13609. Error Message: JSON text is not properly formatted. Unexpected character 'S' is found at position 0.",
    "failureType": "UserError",
    "target": "Stored procedure1",
    "details": []
}

But when I check the input, it seems like it already successfully reading the json string itemName.

But, when I check output, it's not there.

解决方案

Actually, you may could using Get metadata output json as the parameter and then call the stored procedure: Get metedata-->Stored Procedure!

You just need focus on the coding of the stored procedure.

Get Metadata output childitems:

{
   "childItems": [
        {
            "name": "DeploymentFiles.zip",
            "type": "File"
        },
        {
            "name": "geodatalake.pdf",
            "type": "File"
        },
        {
            "name": "test2.xlsx",
            "type": "File"
        },
        {
            "name": "word.csv",
            "type": "File"
        }
}

Stored Procedure:

@activity('Get Metadata1').output.childitems

About how to create the stored procedure(get data from json object), you could ref this blog: Retrieve JSON Data from SQL Server using a Stored Procedure.

这篇关于Azure数据工厂获取元数据以获取Blob文件名并将其传输到Azure SQL数据库表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆