只保留数据工厂中最新的一行数据 [英] Keep only the most recent row of data in data factory

查看：151 发布时间：2017/7/21 0:55:48 sql-server duplicates etl azure-data-factory

本文介绍了只保留数据工厂中最新的一行数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用数据工厂来创建我们的分段区域，问题是每当源数据发生变化时，我们都会向分段表添加一个新行。

I am using Data factory to create our staging area, the problem is whenever source data changes, we add a new row to staging tables.

例如，假设我们有以下数据：

For instance, assume we have the following data:

ID          Fields             created              edited
100        ----------        '2017-07-01'         '2017-07-05'

这将存储在我们的分期表中，如下所示：

this will be stored in our staging tables like this:

  ID          Fields             created              edited
  100        ----------        '2017-07-01'            null 
  100        ----------        '2017-07-01'         '2017-07-05'

选择最近的一行是昂贵的，我们不想要。你认为我们可以避免在分段中存储重复的ID？

Selecting the most recent row is expensive and We don't want that. How do you think we can avoid storing duplicate IDs in staging?

我假设在创建管道时，应该有一种方法来更新数据，如果ID已经存在于分段中。

查询格式数据工厂是这样的：

I assume on creating the pipelines, there should be a way to update the data if the ID already exists in staging.
the query format in data factory is like this:

$$Text.Format('select * from <<table>> where <<column>> >= \'{0:yyyy-MM-dd HH:mm}\' AND <<column>> < \'{1:yyyy-MM-dd HH:mm}\'', WindowStart, WindowEnd)

只保留数据工厂中最新的一行数据 [英] Keep only the most recent row of data in data factory

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录关闭

只保留数据工厂中最新的一行数据 [英] Keep only the most recent row of data in data factory

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭