高效查找和替换文档中的字符串 [英] Find and Replace Strings in Documents Efficiently

查看:33
本文介绍了高效查找和替换文档中的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下查询,用于在名称字段中查找   标记并将它们替换为空白 - 以摆脱它们.
名称字符串可以有 1 到多个   标签,例如

I have the following query, to find   tags in a name field and replace them with an empty space - to get rid of them.
Name strings can have 1 to many   tags e.g.

AA aa
AA  aa
AA   aa
AA    aa
AA AA aaaaaaaa

...就像那样.

  db.tests.find({'name':/.* .*/}).forEach(function(test){
      test.name = test.name.replace(" ","");
      db.tests.save(test);
   });

   db.tests.find({'name':/.*  .*/}).forEach(function(test){
      test.name = test.name.replace("  ","");
      db.tests.save(test);
   });

  db.tests.find({'name':/.*   .*/}).forEach(function(test){
      test.name = test.name.replace("   ","");
      db.tests.save(test);
   });

除了重复相同的查询模式,是否有更好的解决方案来处理这种情况,减少重复和提高性能?

Other than repeating the same query pattern, is there a better solution to handle this scenario, in terms of less duplication and higher performance?

推荐答案

当然,如果您只想从文本中去除   实体,那么您只需进行全局匹配并替换:

Surely if all you want to do is strip the   entities from your text then you just do a global match and replace:

db.tests.find({ "name": / /g }).forEach(function(doc) {
    doc.name = doc.name.replace(/ /g,"");
    db.tests.update({ "_id": doc._id },{ "$set": { "name": doc.name } });
});

所以不需要写出每个组合,正则表达式会用 /g 选项替换非常匹配.可能还使用 /m 表示多行是您的名称"字符串包含换行符.查看基本的正则表达式示例.

So there should be no need to write out every combination, the regex will replace very match with the /g option. Possibly also use /m for multi-line is your "name" string contains newline characters. See a basic regexer example.

同样推荐使用$set 以便只修改您真正想要的字段而不是 .save() 整个文档.自文档被读取后,流量更少,覆盖其他进程可能做出的更改的可能性也更小.

It is also recommended to use $set in order to only modify the field(s) you really want to rather than .save() the whole document back. There is less traffic and less chance of overwriting changes that might have been made by another process since the document was read.

理想情况下,您将批量操作 API 与 MongoDB 2.6 及更高版本一起使用.这允许更新批量",因此客户端和服务器之间的流量再次减少:

Ideally you would use the Bulk Operations API with MongoDB versions 2.6 and greater. This allows the updates to "batch" so there is again less traffic between the client and the server:

var bulk = db.tests.initializeOrderedBulkOp();
var count = 0;

db.tests.find({ "name": / /g }).forEach(function(doc) {
    doc.name = doc.name.replace(/ /g,"");
    bulk.find({ "_id": doc._id })
        .updateOne({ "$set": { "name": doc.name } });
    count++;

    if ( count % 1000 == 0 ) {
        bulk.execute();
        bulk = db.tests.initializeOrderedBulkOp();
    }
});

if  ( count % 1000 != 0 )
    bulk.execute();

这些是您改进这一点的主要方法.不幸的是,MongoDB 更新语句无法以这种方式使用现有值作为其更新表达式的一部分,因此唯一的方法是循环,但您可以做很多事情来减少操作,如图所示.

Those are your primary ways to improve this. Unfortunately there is no way for a MongoDB update statement to use an existing value as part of it's update expression in this way, so the only way is looping, but you can do a lot to reduce the operations as is shown.

这篇关于高效查找和替换文档中的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆