如何确定$ addToSet是否实际上已将新项目添加到MongoDB文档中或该项目已存在? [英] How to determine if $addToSet actually added a new item into a MongoDB document or if the item already existed?

查看:73
本文介绍了如何确定$ addToSet是否实际上已将新项目添加到MongoDB文档中或该项目已存在?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用C#驱动程序(来自NuGet的v1.8.3),并且很难确定$addtoSet/upsert操作是否实际上将新项目添加到给定数组中,或者该项目已经存在.

I'm using the C# driver (v1.8.3 from NuGet), and having a hard time determining if an $addtoSet/upsert operation actually added a NEW item into the given array, or if the item was already existing.

添加新项目可能有两种情况,要么根本不存在该文档,而只是由upsert创建的;要么该文档存在,但是数组不存在或不包含给定的项目.

Adding a new item could fall into two cases, either the document didn't exist at all and was just created by the upsert, or the document existed but the array didn't exist or didn't contain the given item.

我需要这样做的原因是,我有大量数据要加载到MongoDB中,这可能(应该,但可能)在处理期间中断.如果发生这种情况,我需要能够从头开始备份,而无需进行重复的下游处理(保持处理幂等).在我的流程中,如果确定某个项目是新添加的,则我将该给定项目的下游处理排队,如果确定已将其添加到文档中,则无需进行其他下游工作.我的问题是,结果总是返回说调用修改了一个文档,即使该项目已经存在于数组中而实际上什么也没有修改.

The reason I need to do this, is that I have large sets of data to load into MongoDB, which may (shouldn't, but may) break during processing. If this happens, I need to be able to start back up from the beginning without doing duplicate downstream processing (keep processing idempotent). In my flow, if an item is determined to be newly added, I queue up downstream processing of that given item, if it is determined to already have been added in the doc, then no more downstream work is required. My issue is that the result always returns saying that the call modified one document, even if the item was already existing in the array and nothing was actually modified.

基于对C#驱动程序api的理解,我应该能够使用WriteConcern.Acknowledged进行调用,然后检查WriteConcernResult.DocumentsAffected以查看它是否确实更新了文档.

Based on my understanding of the C# driver api, I should be able to make the call with WriteConcern.Acknowledged, and then check the WriteConcernResult.DocumentsAffected to see if it indeed updated a document or not.

我的问题是,在所有情况下,写关注结果都返回了1个文档已更新的信息. :/

My issue is that in all cases, the write concern result is returning back that 1 document was updated. :/

这是我的代码正在调用$addToSet的示例文档,该文档可能会(也可能不会)以项目"列表中的以下特定项目开头:

Here is an example document that my code is calling $addToSet on, which may or may not have this specific item in the "items" list to start with:

{
    "_id" : "some-id-that-we-know-wont-change",
    "items" : [ 
        {                
            "s" : 4,
            "i" : "some-value-we-know-is-static",
        }
    ]
}

我的查询始终使用基于处理元数据而已知的_id值:

My query always uses an _id value which is known based on the processing metadata:

var query = new QueryDocument
{
     {"_id", "some-id-that-we-know-wont-change"}                       
};

我的更新如下:

var result = mongoCollection.Update(query, new UpdateDocument()
{
     {                                                
          "$addToSet", new BsonDocument()
               {
                    { "items", new BsonDocument()
                         {
                              { "s", 4 },
                              { "i", "some-value-we-know-is-static" }                                                                            
                          } 
                    }
               }
     }
}, new MongoUpdateOptions() { Flags = UpdateFlags.Upsert, WriteConcern = WriteConcern.Acknowledged }); 

if(result.DocumentsAffected > 0 || result.UpdatedExisting)
{
     //DO SOME POST PROCESSING WORK THAT SHOULD ONLY HAPPEN ONCE PER ITEM                                                
}

如果我一次在一个空集合上运行此代码,则会添加该文档,并按预期进行响应(DocumentsAffected = 1UpdatedExisting = false).如果我再次运行它(任意次),该文档似乎没有更新,因为它保持不变,但是结果出乎意料(DocumentsAffected = 1UpdatedExisting = true).

If i run this code one time on an empty collection, the document is added and response is as expected ( DocumentsAffected = 1, UpdatedExisting = false). If I run it again (any number of times), the document doesn't appear to be updated as it remains unchanged but the result is now unexpected (DocumentsAffected = 1, UpdatedExisting = true).

如果文档未更改,这不应该返回DocumentsAffected = 0吗?

Shouldn't this be returning DocumentsAffected = 0 if the document is unchanged?

由于我们每天需要进行数百万次此类调用,因此我很犹豫是否将此逻辑转换为每个项目多个调用(首先检查该项目是否存在于给定的文档数组中,然后添加/排队或只是跳过)(如果可能).

As we need to do many millions of these calls a day, I'm hesitant to turn this logic into multiple calls per item (first checking if the item exists in the given documents array, and then adding/queuing or just skipping) if at all possible.

是否有某种方法可以使它在一次通话中正常工作?

Is there some way to get this working in a single call?

推荐答案

当然,您在这里所做的实际上是检查响应,该响应确实指示文档是已更新还是已插入,或者实际上是否未执行任何操作.这是关于 $addToSet 执行更新的最佳指示,然后文档将被更新.

Of course what you are doing here is actually checking the response which does indicate whether a document was updated or inserted or in fact if neither operation happened. That is your best indicator as for an $addToSet to have performed an update the document would then be updated.

$addToSet 运算符本身不能产生重复项,这是该运算符的本质.但是您的逻辑确实可能存在一些问题:

The $addToSet operator itself cannot produce duplicates, that is the nature of the operator. But you may indeed have some problems with your logic:

{                                                
      "$addToSet", new BsonDocument()
           {
                { "items", new BsonDocument()
                     {
                          { "id", item.Id },
                          { "v", item.Value } 
                     }
                }
           }
 }

很显然,您正在显示集合"中的项目由两个字段组成,因此,如果内容以任何方式变化(即,相同的ID但值不同),则该项目实际上是该项目的唯一"成员设置并将添加.例如, $addToSet 运算符将无法完全不基于"id"作为唯一标识符添加新值.您实际上必须在代码中进行滚动.

So clearly you are showing that an item in your "set" is composed of two fields, so if that content varies in any way ( i.e same id but different value) then the item is actually a "unique" member of the set and will be added. There would be no way for instance for the $addToSet operator to not add new values purely based on the "id" as a unique identifier. You would have to actually roll that in code.

这里重复形式的第二种可能性是您的查询部分没有正确找到必须更新的文档.这样的结果将是创建一个新文档,该文档仅包含集合"中新指定的成员.因此,常见的用法错误是这样的:

A second possibility here for a form of duplicate is that your query portion is not correctly finding the document that has to be updated. The result of this would be creating a new document that contains only the newly specified member in the "set". So a common usage mistake is something like this:

db.collection.update(
    { 
        "id": ABC,
        "items": { "$elemMatch": {
            "id": 123, "v": 10
         }},
    {
        "$addToSet": {
            "items": {
                "id": 123, "v": 10
            }
        }
    },
    { "upsert": true }
)

这种操作的结果将始终创建一个新文档,因为现有文档在"set"中不包含指定的元素.正确的实现是检查设置"成员的存在,并允许 $addToSet 进行这项工作.

The result of that sort of operation would always create a new document because the existing document did not contain the specified element in the "set". The correct implementation is to not check for the presence of the "set" member and allow $addToSet to do the work.

如果确实在子文档中所有元素完全相同的集合"中确实存在 true 重复条目,则说明它是由存在或存在于其中的其他代码引起的过去.

If indeed you do have true duplicate entries occurring in the "set" where all elements of the sub-document are exactly the same, then it has been caused by some other code either present or in the past.

如果确定要创建一个新条目,请遍历代码中的 $push 实例,或者甚至是对似乎作用于同一字段的代码中的数组进行操作.

Where you are sure there a new entries being created, look through the code for instances of $push or indeed and array manipulation in code that seems to be acting on the same field.

但是,如果您正确使用了操作员,则 $addToSet 会完全按照预期的方式进行操作.

But if you are using the operator correctly then $addToSet does exactly what it is intended to do.

这篇关于如何确定$ addToSet是否实际上已将新项目添加到MongoDB文档中或该项目已存在?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆