如何提高蒙戈更新操作的性能? [英] How to increase performance of the update operation in Mongo?

查看:183
本文介绍了如何提高蒙戈更新操作的性能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

foreach (var doc in await records.Find(filter).ToListAsync())
{
    var query = Builders<JobInfoRecord>.Filter.Eq("JobTypeValue", doc.JobTypeValue);
    var updatedJobInfo = Regex.Replace(doc.SerializedBackgroundJobInfo, pattern, "<$1></$1>");
    var update = Builders<JobInfoRecord>.Update.Set("SerializedBackgroundJobInfo", updatedJobInfo);

    records.UpdateOneAsync(query, update).Wait();
}

时,它更新文件的最好方法(我改变的值标签包含密码在名称空标记XML字符串:< ADMINPASSWORD>< / ADMINPASSWORD> 或的演示)?我使用的Mongo驱动程序2.0.2

Is it the best way to update document (I'm changed a values of tags in xml string that contains password in name to empty tag: ​<adminPassword></adminPassword> or demo )? I'm using Mongo driver 2.0.2

我有 500 000 文档的集合,其中有我执行更新大约每分钟(希望)。 3000 文件。

I have a collection with 500 000 documents, of which I am performing updates each minute (hopefully) of approx. 3000 documents.

我怎样才能提高更新性能操作?

推荐答案

当在你的方式进行更新,则需要以检查它来检索文件内容做出这样的修改。 MongoDB中有在你想要做的方式存在的价值观行事没有原子操作,所以迭代必修课。

When updating in the way that you are, you need to retrieve the document content in order to inspect it and make such modifications. MongoDB has no atomic operations that act on existing values in the way that you want to do, so iteration is of course required.

有在查询中没有真正的区别你是如何对正则表达式语句的两个版本之间的匹配部分。不管是什么,内容转换为BSON发送到服务器反正之前,因此,如果您使用标准表达式生成器或直接BSON文档是无关紧要的。

There is no real difference in the "query" portion of how you are matching on the regular expression between your two versions of the statement. No matter what, the content is converted to BSON before sending to the server anyway, so if you use a standard expression builder or a direct BSON document is of little consequence.

但到性能的改进,可制成。

But on to the performance improvements that can be made.

如前所述,批量操作,你应该对这些名单迭代中更新的方式,你也应该使用游标,而不是所有的结果转换为一个列表,因为这将节省内存。

As stated, Bulk Operations are the way you should be updating on such list iteration, and you also "should" be using a cursor rather than converting all results to a list, since it will save on memory.

成功避开所有特定类型声明,只是表示为 BsonDocument (这可能会为您节省编组,但不需要),则基本示例流程将是:

Eschewing all the specific type declarations and just representing as BsonDocument ( which will probably save you on marshalling, but not needed ) then the basic example process would be:

var pattern = @"(?si)<([^\s<]*workUnit[^\s<]*)>.*?</\1>";
var filter = Builders<JobInfoRecord>.Filter.Regex(x => x.SerializedBackgroundJobInfo,
                                              new BsonRegularExpression(pattern, "i"));


var ops = new List<WriteModel<BsonDocument>>();
var writeOptions = new BulkWriteOptions() { IsOrdered = false };

using ( var cursor = await records.FindAsync<BsonDocument>(filter))
{
    while ( await cursor.MoveNextAsync())
    {
        foreach( var doc in cursor.Current )
        {
            // Replace inspected value
            var updatedJobInfo = Regex.Replace(doc.SerializedBackgroundJobInfo, pattern, "<$1></$1>");

            // Add WriteModel to list
            ops.Add(
                new UpdateOneModel<BsonDocument>(
                    Builders<BsonDocument>.Filter.Eq("JobTypeValue", doc.JobTypeValue),
                    Builders<BsonDocument>.Update.Set("SerializedBackgroundJobInfo", updatedJobInfo)
                )
            );

            // Execute once in every 1000 and clear list
            if (ops.Count == 1000)
            {
                BulkWriteResult<BsonDocument> result = await records.BulkWriteAsync(ops,writeOptions);
                ops = new List<WriteModel<BsonDocument>>();
            }
        }
    }

    // Clear any remaining
    if (ops.Count > 0 )
    {
        BulkWriteResult<BsonDocument> result = await records.BulkWriteAsync(ops,writeOptions);
    }

}



所以,与其作出的请求该数据库从查询检索每一个文件,创建一个列表 WriteModel 操作来代替。

So rather than make a request to the database for every single document retrieved from the query, you create a List of WriteModel operations instead.

在这个列表中有生长到一个合理的值(1000在本实施例)在提交在单个请求和响应的所有批处理操作的写操作到服务器。在这里,我们使用 BulkWriteAsync

Once this list has grown to a reasonable value ( 1000 in this example ) you commit the write operation to the server in a single request and response for all batched operations. Here we use BulkWriteAsync.

您可以在一个大小,如果你喜欢大于1000创建批次,但它通常是一个合理的数字来处理。唯一真正的硬性限制是16MB的BSON限制,这因为所有的请求都还在这仍然适用实际BSON文档。反正它需要大量的请求,接近16MB的,但也有一个阻抗的匹配如何,当它真正到达服务器,的如记录

You can create the batches in a size greater than 1000 if you like, but it generally is a reasonable number to deal with. The only real hard limit is the BSON limit of 16MB, which since all requests are still actually BSON documents this still applies. Anyway it takes a lot of requests to approach 16MB, but there is also an impedence match to consider in how the request will be processed when it actually reaches the server, as documented:

每组操作最多能有1000次作业。如果一组超过此限制,MongoDB的将分小组到1000或更小更小的群体。例如,如果批量操作列表由2000插入操作,MongoDB的创建2组,每组1000操作。

因此,通过保持请求大小在服务器将如何处理它,你也同样级别您可以通过收益的利益,其中多批,可实际上在并行连接代理到服务器上,而不是让服务器做的分裂和排队。

Therefore by keeping the request size at the same level of how the server will process it, you also get the benefit from the yield where "multiple batches" can be in fact acting in parallel connections to the server, rather than letting the server do the splitting and queueing.

返回的结果是 BulkWriteResult 其中将包含在匹配的号码信息以及批量发送操作修改等。

The returned result is of BulkWriteResult which will contain information on the number of "matches" and "modifications" etc from the batch of operations sent.

自然地,因为操作中的批处理,是有意义的,然后在所述循环迭代结束检查,看是否有更多的批处理在列表中存在的操作,然后以相同的方式,当然提交

Naturally since the operations are in "batches", it makes sense to then check at the end of the loop iteration to see if any more "batched" operations exist in the list, and then of course submit in the same way.

还注意到 IsOrdered = FALSE BulkWriteOptions 意味着操作的批次没有在串行顺序实际执行,这意味着服务器可以实际上运行TAKS在平行。这可以使在不要求承诺的秩序庞大的速度提升。默认是提交有序和连续。

Also noting the IsOrdered = false as BulkWriteOptions means that the batch of operations is not actually executed in serial order, which means that the server can in fact run the taks in "parallel". This can make "huge" speed improvements where the order of commitment is not required. The default is to submit "ordered" and serially.

这是不需要设置该选项,但如果你的顺序并不重要(它不应该在这情况下,由于没有其他的操作请求在这里取决于文件的prevvious修改),那么你得到的改善是值得的。

This is not required to set this option, but if your order is not important ( which it should not be in this case since no other operation requests here depend on the prevvious modification of a document ) then the improvement you get is worthwhile.

这是怎么一回事是减少的个数向服务器发出请求的实际。发送更新,并等待回应需要时间,而在大的操作是一个非常昂贵的锻炼; Tibial。这就是批量操作是为了应对,由一个请求中应用多个操作。

What this is all about is "reducing" the number of actual requests made to the server. Sending updates and awaiting a response takes time, and in large operations is a very costly excercise. That is what Bulk Operations are meant to deal with, by applying several operations within the one request.

减少开销是一个巨大的性能增益。这就是为什么你用这个。

Reducing that overhead is a "huge" performance gain. That's why you use this.

这篇关于如何提高蒙戈更新操作的性能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆