ADF复制数据活动-在插入SQL数据库之前检查重复记录 [英] ADF copy data activity - check for duplicate records before inserting into SQL db
问题描述
我有一个非常简单的ADF管道,用于将数据从本地mongoDB(自托管集成环境)复制到Azure SQL数据库.
I have a very simple ADF pipeline to copy data from local mongoDB (self-hosted integration environment) to Azure SQL database.
我的pipleline能够从mongoDB复制数据并将其插入SQL db. 当前,如果我运行管道,则如果运行多次,它将插入重复的数据.
My pipleline is able to copy the data from mongoDB and insert into SQL db. Currently if I run the pipeline it inserts duplicate data if run multiple times.
我已经将_id列设置为SQL数据库中的唯一列,并且由于SQL约束,现在正在运行管道引发和错误,因此不会允许它插入记录.
I have made _id column as unique in SQL database and now running pipeline throws and error because of SQL constraint wont letting it insert the record.
在插入SQL db之前如何检查重复的_id?
我应该使用预复制脚本/存储过程吗? 一些指导/说明将有助于在哪里添加额外的步骤.谢谢
should I use Pre-copy script / stored procedure? Some guidance / directions would be helpful on where to add extra steps. Thanks
推荐答案
Azure Data Factory Data Flow can help you achieve that:
您可以按照以下步骤操作:
You can follow these steps:
- 添加两个源:Cosmos db表(源1)和SQL数据库表(源2).
-
使用加入主动从Cosmos table.id = SQL table.id的两个表(左联接/完全联接/右联接)获取所有数据.
- Add two sources: Cosmos db table(source1) and SQL database table(source2).
Using Join active to get all the data from two tables(left join/full join/right join) on Cosmos table.id= SQL table.id.
AlterRow 表达式以过滤重复的_id,不重复的_id然后插入它.
AlterRow expression to filter the duplicate _id, it not duplicate then insert it.
然后将不重复的列映射到Sink SQL数据库表.
Then mapping the no-duplicate column to the Sink SQL database table.
希望这会有所帮助.
这篇关于ADF复制数据活动-在插入SQL数据库之前检查重复记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!