如何在Cosmos DB中查找重复文档 [英] How to find Duplicate documents in Cosmos DB

查看:116
本文介绍了如何在Cosmos DB中查找重复文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经看到在特定的一天,有大量的数据从流分析作业写入cosmos DB. 不应每天写大量文档.我必须检查当天是否有文件重复.

I have seen like a huge amount of data write to cosmos DB from stream analytics job on a particular day. It was not supposed to write huge amount of documents in a day. I have to check if there is duplication of documents on that particular day.

是否有任何查询/以任何方式在cosmos DB中找出重复的记录?

Is there any query/any way to find out duplicate records in cosmos DB?

推荐答案

如果您知道要检查重复项的属性,则有可能. 我们遇到了一个令人讨厌的生产问题,并导致许多重复的记录. 与MS支持人员联系以帮助我们识别重复的文档后,他们给了我们以下查询;

It is possible if you know the properties to check for duplicates. We had a nasty production issue causing many duplicate records as well. Upon contacting MS Support to help us identify the duplicate documents, they gave us the following query;

请紧记:属性 A B 一起在我们的案例中定义了唯一性.因此,如果两个文档的A和B值相同,则它们是重复的. 然后,您可以使用此查询的输出来删除最旧的查询,但保留最近的查询(基于_ts)

Bear in mind: property A and B together define the uniqueness in our case. So if two documents have the same value for A and B, they are duplicate. You can then use the output of this query to, for example, delete the oldest ones but keep the recent (based on _ts)

SELECT d.A, d.B From 
   (SELECT c.A, c.B, count(c._ts) as counts FROM c
    GROUP BY c.Discriminator, c.EndDateTime) AS d
WHERE d.counts > 1

这篇关于如何在Cosmos DB中查找重复文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆