Azure数据工厂获取元数据以获取Blob文件名并将其传输到Azure SQL数据库表第2部分 [英] Azure Data Factory Get Metadata to get blob filenames and transfer them to Azure SQL database table part 2

查看:73
本文介绍了Azure数据工厂获取元数据以获取Blob文件名并将其传输到Azure SQL数据库表第2部分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在Azure数据工厂中使用获取元数据"活动,以获取blob文件名并将其复制到Azure SQL数据库表中.我在获取元数据"活动之后添加了存储过程活动.这是我的新活动存储过程,我按照建议添加了参数,但是我将名称更改为JsonData,因为我的存储过程需要此参数.

I am trying to use Get Metadata activity in Azure Data Factory in order to get blob filenames and copy them to Azure SQL database table. I added the stored procedure activity after Get Metadata activity. Here is my new activity stored procedure, I added the parameter as suggested however, I changed the name to JsonData as my stored procedure requires this parameter.

这是我的存储过程.

/****** Object:  StoredProcedure [dbo].[InsertDataJSON]    Script Date: 10/14/2020 11:01:30 AM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
/*USE datafactorypoc1*/

ALTER PROCEDURE [dbo].[InsertDataJSON] (
    @JsonData NVARCHAR (MAX)
)
AS
BEGIN
    DECLARE @err int

    INSERT INTO extractFileNameTest1 (ItemName, ItemType, EffIntegrationRunTieme, ExecutionDuration, DurationInQueue)
    SELECT ItemName, ItemType, EffIntegrationRunTieme, ExecutionDuration, DurationInQueue
    FROM OPENJSON (@JsonData, N'$') WITH (
        ItemName VARCHAR(255) N'$.ItemName',
        ItemType VARCHAR(255) N'$.ItemType',
        EffIntegrationRunTieme VARCHAR(255) N'$.EffIntegrationRunTieme',
        ExecutionDuration INT N'$.ExecutionDuration',
        DurationInQueue INT N'$.DurationInQueue'
    )    

    SELECT @err = @@ERROR
    RETURN (@err)
END

在存储过程中出现此错误:

I get this error at the stored procedure:

{
    "errorCode": "2402",
    "message": "Execution fail against sql server. Sql error number: 13609. Error Message: JSON text is not properly formatted. Unexpected character 'S' is found at position 0.",
    "failureType": "UserError",
    "target": "Stored procedure1",
    "details": []
}

但是当我检查输入时,似乎已经成功读取了json字符串itemName.

But when I check the input, it seems like it already successfully reading the json string itemName.

但是,当我检查输出时,它不存在.

But, when I check output, it's not there.

您能帮我检查一下我在这里做错了什么吗?是我的存储过程吗?提前非常感谢您.

Could you please help me check what I did wrong here? Is it my stored procedure? Thank you very much in advance.

更新15/10/2020 我创建了一个新管道,并在ForEach活动中移动了存储过程以连接到Get_Filename_2,而不是紧随其后的第一个广告:

Update 15/10/2020 I created a new pipeline and move the Stored Procedure inside ForEach activity to connect to Get_Filename_2 instead of the first one ad followed:

在这里,我将参数的值更改为Get_Filename_2并作为itemName而不是childitems输出(因为使用childitems时出现错误,因为childitems来自Get_Filename_1而不是2).

Here I changed the value of parameter to Get_Filename_2 and output as itemName instead of childitems (because I got an error with using childitems because childitems is from Get_Filename_1 and not 2).

执行管道(失败)后,最后一个存储过程的输入为:

After executing the pipeline (which was failed), the input of the last stored procedure is:

{
    "storedProcedureName": "[dbo].[InsertDataJSON]",
    "storedProcedureParameters": {
        "JsonData": {
            "value": "FRRNSC84FIN1_A2276801_20200103-152534.json",
            "type": "String"
        }
    }
}

输出为:

{
    "effectiveIntegrationRuntime": "DefaultIntegrationRuntime (West Europe)",
    "executionDuration": 0,
    "durationInQueue": {
        "integrationRuntimeQueue": 0
    },
    "billingReference": {
        "activityType": "ExternalActivity",
        "billableDuration": [
            {
                "meterType": "AzureIR",
                "duration": 0.016666666666666666,
                "unit": "Hours"
            }
        ]
    }
}

对于此管道,它失败并显示以下错误消息.

For this pipeline, it's failed with following error message.

{
    "errorCode": "2402",
    "message": "Execution fail against sql server. Sql error number: 13609. Error Message: JSON text is not properly formatted. Unexpected character 'F' is found at position 0.\r\nFRRNSC84FIN1_A2276801_20200103-152534.json",
    "failureType": "UserError",
    "target": "Stored procedure1",
    "details": []
}

在我的旧管道中,我在ForEach循环之外存储了过程.管道没有失败:

On my old pipeline where I have stored procedure outside of ForEach loop. The pipeline did not fail:

这是上一个存储过程活动的输入:

Here is the input of the last stored procedure activity:

{
    "storedProcedureName": "[dbo].[InsertDataJSON]",
    "storedProcedureParameters": {
        "JsonData": {
            "value": "[{\"name\":\"FRRNSC84FIN1_A2274001_20200103-143748_back_camera_0.jpeg\",\"type\":\"File\"},{\"name\":\"FRRNSC84FIN1_A2274001_20200103-143748_right_camera_0.jpeg\",\"type\":\"File\"},{\"name\":\"FRRNSC84FIN1_A2274801_20200103-144811_right_camera_0.jpeg\",\"type\":\"File\"},{\"name\":\"FRRNSC84FIN1_A2275201_20200103-145229_right_camera_0.jpeg\",\"type\":\"File\"},{\"name\":\"FRRNSC84FIN1_A2276801_20200103-152534.json\",\"type\":\"File\"}]"
        }
    }
}

这是输出:

{
    "effectiveIntegrationRuntime": "DefaultIntegrationRuntime (West Europe)",
    "executionDuration": 0,
    "durationInQueue": {
        "integrationRuntimeQueue": 0
    },
    "billingReference": {
        "activityType": "ExternalActivity",
        "billableDuration": [
            {
                "meterType": "AzureIR",
                "duration": 0.016666666666666666,
                "unit": "Hours"
            }
        ]
    }
}

此管道成功运行,但是SQL中的结果不是我真正期望的,在第​​一个json字符串之前也包含许多NULL值,但是它将所有itemName插入相同的位置.这是因为我认为它不在ForEach循环之外.但是为什么要插入这么多NULL?

This pipeline ran successfully however the result in the SQL is not what I really expected, also many NULL values before the first json string but it insert all itemName in the same location. This is because it's outside of ForEach loop I think. But why it insert so many NULL?

这是我修改的存储过程:

Here is my stored procedure that I modified:

/****** Object:  StoredProcedure [dbo].[InsertDataJSON]    Script Date: 15/10/2020 10:31:51 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
/*USE datafactorypoc1*/

ALTER PROCEDURE [dbo].[InsertDataJSON] (
    @JsonData NVARCHAR (MAX)
)
AS
BEGIN
    PRINT @JsonData

    /*INSERT INTO Logs values (DEFAULT, @JsonData)*/
    INSERT INTO extractFileNameTest1 values (@JsonData, DEFAULT)
    SELECT * FROM
        OPENJSON (@JsonData)
    WITH (
        ItemName VARCHAR(255) N'$.ItemName',
        ItemType VARCHAR(255) N'$.ItemType'
    )
END

我试图删除测试SQL表中的所有数据,以查看第二条故障管道的输出.我注意到它确实正确解析了表中的json字符串文件名以及所有文件(我在blob存储中只有5个文件).但是其他数据为NULL.

I tried to delete all data in my test SQL table to see the output of the 2nd fail pipeline. I notice that it did parse correctly the json string filename inside my table and all of them (I only have 5 files in blob storage). But other data are NULL.

总而言之,新管道的SQL表中的结果要好得多,每个文件名都在表的不同单元格上,但是管道运行失败.您能帮我检查一下我做错了什么吗?是存储过程还是ForEach循环中我上一次存储过程活动中的表达式?

To conclude, the result in SQL table of the new pipeline is much better, each file name on the different cell of the table but I got pipeline run failed. Could you please help me check what I did wrong here? Is it the stored procedure or the expression in my last Stored procedure activity inside ForEach loop?

非常感谢您.

推荐答案

在这种情况下,我仅使用1 Get Metadata活动:获取元数据活动的数据集= Blob存储中的二进制文件获取元数据的字段列表=子项

For this scenario, I use only 1 Get Metadata activity with: Dataset of Get Metadata activity = Binary files from Blob storage Field List of Get Metadata = Child items

此获取元数据"活动的输出已连接到ForEach活动:ForEach活动设置=>项目=>@activity('Get_FileName_1').output.childItems

This Get Metadata activity has output connected to ForEach activity: ForEach activity settings => Items => @activity('Get_FileName_1').output.childItems

在ForEach活动中,有1个活动是存储过程:存储过程设置已将链接的服务= Azure SQL数据库"与选定的存储过程链接在一起.在我编写的存储过程中,我定义了将从blob文件名读取的列.然后,对于存储过程"活动的参数,我定义了完全相同的参数编号=存储过程中@JsonData的所有列.然后,对于参数的每个值,我使用Azure数据工厂函数@substring提取blob文件名的不同部分.然后将它们插入到Azure SQL表的正确列中.

Inside ForEach activity, there is 1 activity which is Stored Procedure: Stored Procedure Settings has Linked Service = Azure SQL Database with selected Stored Procedure. In the stored procedure that I wrote, I defined the columns that I will read from blob file names. Then, for the parameters of Stored Procedure activity, I define exact same parameters numbers = all columns of @JsonData in stored procedure. Then, for each value of the parameter, I use Azure Data Factory function @substring to extract different parts of blob file name. Then insert these to the correct columns in Azure SQL table.

子字符串功能的来源:

Source for Substring function: https://docs.microsoft.com/en-us/azure/data-factory/control-flow-expression-language-functions#substring

这篇关于Azure数据工厂获取元数据以获取Blob文件名并将其传输到Azure SQL数据库表第2部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆