如何在管道执行之间在Azure Data Factory中存储运行时数据? [英] How do I store run-time data in Azure Data Factory between pipeline executions?

查看:71
本文介绍了如何在管道执行之间在Azure Data Factory中存储运行时数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在关注Microsoft的

解决方案

有一种方法可以通过使用复制"活动来实现,但是在"LookupOldWaterMarkActivity"中获取最新水印很复杂,仅供参考.

数据集设置:

复制活动设置:

源和接收器数据集是相同的.将其他列中的表达式更改为 @ {activity('LookupNewWaterMarkActivity').output.firstRow.NewWatermarkvalue}

通过此操作,您可以将水印另存为.txt文件中的列.但是很难通过Lookup活动获得最新的水印.因为您的"LookupOldWaterMarkActivity"输出将如下所示:

  {计数":1值":[{" Prop_0":" 11/24/2020 02:39:14" ;,"Prop_1":"11/24/2020 08:31:42"}]} 

密钥名称由ADF生成.如果要获取"11/24/2020 08:31:42",则需要获取列数,然后使用这样的表达式: @activity('LookupOldWaterMarkActivity').output.value [0] [Prop_(列数-1)]

如何获取最新水印:

  1. 使用GetMetadata活动获取columnCount

  2. 使用以下表达式: @activity('LookupOldWaterMarkActivity').output.value [0] [concat('Prop _',string(sub(activity('Get Metadata1').output.columnCount,1)))]

I have been following Microsoft's tutorial to incrementally/delta load data from an SQL Server database.

It uses a watermark (timestamp) to keep track of changed rows since last time. The tutorial stores the watermark to an Azure SQL database using the "Stored Procedure" activity in the pipeline so it can be reused in the next execution.

It seems overkill to have an Azure SQL database just to store that tiny bit of meta information (my source database is read-only btw). I'd rather just store that somewhere else in Azure. Maybe in the blob storage or whatever.

In short: Is there an easy way of keeping track of this type of data or are we limited to using stored procs (or Azure Functions et al) for this?

解决方案

There is a way to achieve this by using Copy activity, but it is complicated to get latest watermark in 'LookupOldWaterMarkActivity', just for reference.

Dataset setting:

Copy activity setting:

Source and sink dataset is the same one. Change the expression in additional columns to @{activity('LookupNewWaterMarkActivity').output.firstRow.NewWatermarkvalue}

Through this, you can save watermark as column in .txt file. But it is difficult to get the latest watermark with Lookup activity. Because your output of 'LookupOldWaterMarkActivity' will be like this:

{
    "count": 1,
    "value": [
        {
            "Prop_0": "11/24/2020 02:39:14",
            "Prop_1": "11/24/2020 08:31:42"
        }
    ]
}

The name of key is generated by ADF. If you want to get "11/24/2020 08:31:42", you need to get column count and then use expression like this: @activity('LookupOldWaterMarkActivity').output.value[0][Prop_(column count - 1)]

How to get latest watermark:

  1. use GetMetadata activity to get columnCount

  2. use this expression:@activity('LookupOldWaterMarkActivity').output.value[0][concat('Prop_',string(sub(activity('Get Metadata1').output.columnCount,1)))]

这篇关于如何在管道执行之间在Azure Data Factory中存储运行时数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆