DocumentDB更新多个文档失败 [英] DocumentDB updating multiple documents fails

查看:52
本文介绍了DocumentDB更新多个文档失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经编写了一个存储过程,用于将Type属性添加到DocumentDB集合中的所有文档.不幸的是,存储过程在仅更新一个文档后失败.馆藏约有5000份文件.

I have written af stored procedure for adding a Type property to all documents in a DocumentDB collection. Unfortunately the stored procedure fails after updating just one document. The collection contains around 5000 documents.

这是存储过程:

function updateSproc() {
var collection = getContext().getCollection();
var collectionLink = collection.getSelfLink();
var response = getContext().getResponse();
var responseBody = {
    updated: 0,
    continuation: true,
    error: "",
    log: ""
};

// Validate input.
tryQueryAndUpdate();

// Recursively queries for a document by id w/ support for continuation tokens.
// Calls tryUpdate(document) as soon as the query returns a document.
function tryQueryAndUpdate(continuation) {
    var query = { query: "SELECT * FROM root c WHERE NOT is_defined(c.Type)", parameters: []};
    var requestOptions = { continuation: continuation};

    var isAccepted = collection.queryDocuments(collectionLink, query, requestOptions, function(err, documents, responseOptions) {
        if (err) {
    responseBody.error = err;
    throw err;
        }

        if (documents.length > 0) {
            // If documents are found, update them.
            responseBody.log += "Found documents: " + documents.length;
            tryUpdate(documents);
        } else if (responseOptions.continuation) {
            responseBody.log += "Continue query";
            tryQueryAndUpdate(responseOptions.continuation);
        } else {
            responseBody.log += "No more documents";
            responseBody.continuation = false;
            response.setBody(responseBody);
        }

    });

    // If we hit execution bounds - throw an exception.
    if (!isAccepted) {
        responseBody.log += "Query not accepted";
        response.setBody(responseBody);
    }
}

// Updates the supplied document according to the update object passed in to the sproc.
function tryUpdate(documents)
{
    if (documents.length > 0) {
        responseBody.log += "Updating documents " + documents.length;

        var document = documents[0];

        // DocumentDB supports optimistic concurrency control via HTTP ETag.
        var requestOptions = { etag: document._etag};

        document.Type="Type value";

        // Update the document.
        var isAccepted = collection.replaceDocument(document._self, document, requestOptions, function(err, updatedDocument, responseOptions) {
           if (err) {
              responseBody.error = err;
              throw err;
           }

           responseBody.updated++;
           documents.shift();
           tryUpdate(documents);
        });

        // If we hit execution bounds - throw an exception.
        if (!isAccepted) {
            responseBody.log += "Update not accepted";
            response.setBody(responseBody);
        }
    } else {
        tryQueryAndUpdate();
    }
}}

基于返回的响应,我可以看到查询返回了100个文档.tryUpdate被调用了两次,但是第二次对replaceDocument的调用不被接受.为什么有很多要更新的文档不被接受?

Based on the response returned I can see that the query returns 100 documents. tryUpdate is called twice but the second call to replaceDocument is not accepted. Why is it not accepted when there are many documents to update?

推荐答案

根据我对同一问题的回答

As per my answer on to the same question MSDN

是的,在每个每秒仅允许250RUs的集合中,每个插入700RUs +(估计)20RUs是一个问题.该查询为700RUs,因为您正在执行NOT操作,这实际上是一次扫描,因为无法对其进行索引.

Yes, 700RUs + (estimated) 20RUs per insert, on a collection that only allows 250RUs per sec is going to be a problem. The query is 700RUs because you're doing a NOT operation, which is effectively a scan because that can't be indexed.

可以尝试一些事情;

1)更改逻辑以排除NOT is_defined检查,或者排除Order By _ts DESC以获取最后更新的文档.这可能比执行NOT检查便宜.然后,您可以检查所获得的每个文档是否已经具有Type属性,如果没有,请添加一个文档和ReplaceDocument

1) Change the logic to exclude the NOT is_defined check and perhaps Order By _ts DESC to get the docs that were updated last first. That might be cheaper than doing the NOT check. Then you could check each doc you got if it had a Type property already, if not add one and ReplaceDocument

2)您也可以在执行此操作时尝试将集合放大到S3,然后再次将其缩小到S1.这样一来,您就可以玩2500 RU.

2) You could also try scaling the collection up to an S3 while you are doing this operation, and then scale it back down to an S1 again. That will give you 2500 RUs to play with.

3)即使使用S3,您可能仍然会遇到这种情况,它可能仅在比第二个文档多的文档之后才发生.

3) Even using an S3, you might still run in to this, it might just happen after more docs than the 2nd one.

因此,要解决此问题,我将在应用程序中执行查询以仅返回未定义属性的记录的ID,

So, to fix I would execute a query in an app to return just the id of records that didn't have the property defined,

从c中选择is_defined(c.Type)的值c.id

SELECT VALUE c.id FROM c WHERE NOT is_defined(c.Type)

将这些ID粘贴到某种列表/数组中,然后从列表中获取.Take()项,并将其作为数组传递给sproc.现在,通过传递的数组进行sproc循环,以id的形式执行ReadDocument,更新,替换和递增计数器.

Stick those ids in to a list / array of some sort and then .Take() items from the list and pass to sproc as an array. Now have the sproc loop through the passed array doing a ReadDocument by id, update and replace and increment counter.

当isAccepted返回false时,将响应主体设置为计数器的值并返回调用代码.现在,调用代码可以跳过(counter).Take(x)并再次调用该sproc.

When isAccepted returns false, set the response body to the value of the counter and return to calling code. Now the calling code can then Skip(counter).Take(x) and call the sproc again.

看看

Take a look at this sample for an example of how to do bulk insert via a stored proc. This shows how to batch records, exec a sproc, and get the current position the sproc got to in that batch before isAccepted == false from the response body.

这篇关于DocumentDB更新多个文档失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆