从字符串值中删除空格(前导和尾随) [英] Removing white spaces (leading and trailing) from string value

查看:109
本文介绍了从字符串值中删除空格(前导和尾随)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经使用mongoimport在mongo中导入了一个csv文件,我想从字符串值中删除前导和尾随空格.

I have imported a csv file in mongo using mongoimport and I want to remove leading and trailing white spaces from my string value.

是否可以直接在mongo中对所有集合使用修剪函数,还是需要为此编写脚本?

Is it possible directly in mongo to use a trim function for all collection or do I need to write a script for that?

我的收藏包含以下元素:

My collection contains elements such as:

{
  "_id" : ObjectId("53857680f7b2eb611e843a32"),
  "category" : "Financial & Legal Services "
}

我想对所有集合应用修剪功能,以使"category"不应包含任何前导和尾随空格.

I want to apply trim function for all the collection so that "category" should not contain any leading and trailing spaces.

推荐答案

在应用更新时,MongoDB中的更新当前无法引用当前字段的现有值.因此,您将不得不循环:

It is not currently possible for an update in MongoDB to refer to the existing value of a current field when applying the update. So you are going to have to loop:

db.collection.find({},{ "category": 1 }).forEach(function(doc) {
   doc.category = doc.category.trim();
   db.collection.update(
       { "_id": doc._id },
       { "$set": { "category": doc.category } }
   );
})

注意使用> $set 那里的运营商和预计的类别"字段,只是为了减少网络流量"

Noting the use of the $set operator there and the projected "category" field only in order to reduce network traffic"

您可以使用 $regex 进行匹配:

You might limit what that processes with a $regex to match:

db.collection.find({ 
    "$and": [
        { "category": /^\s+/ },
        { "category": /\s+$/ }
    ]
})

或者甚至是纯> $regex 而不使用> $and > ,仅在将多个条件应用于同一字段的MongoDB中才需要.否则 $and 是隐式的所有参数:

Or even as pure $regex without the use of $and which you only need in MongoDB where multiple conditions would be applied to the same field. Otherwise $and is implicit to all arguments:

db.collection.find({ "category": /^\s+|\s+$/ })

将匹配的文档限制为只能处理带有前导或尾随空白的文档.

Which restricts the matched documents to process to only those with leading or trailing white-space.

如果您担心要查找的文档数量,则在拥有MongoDB 2.6或更高版本的情况下,批量更新应该会有所帮助:

If you are worried about the number of documents to look, bulk updating should help if you have MongoDB 2.6 or greater available:

var batch = [];
db.collection.find({ "category": /^\s+|\s+$/ },{ "category": 1 }).forEach(
    function(doc) {
        batch.push({
            "q": { "_id": doc._id },
            "u": { "$set": { "category": doc.catetgory.trim() } }
        });

        if ( batch.length % 1000 == 0 ) {
            db.runCommand("update", batch);
            batch = [];
        }
    }
);

if ( batch.length > 0 )
    db.runCommand("update", batch);

或者甚至使用批量操作API MongoDB 2.6及更高版本:

Or even with the bulk operations API for MongoDB 2.6 and above:

var counter = 0;
var bulk = db.collection.initializeOrderedBulkOp();
db.collection.find({ "category": /^\s+|\s+$/ },{ "category": 1}).forEach(
    function(doc) {
        bulk.find({ "_id": doc._id }).update({
            "$set": { "category": doc.category.trim() }
        });
        counter = counter + 1;

        if ( counter % 1000 == 0 ) {
            bulk.execute();
            bulk = db.collection.initializeOrderedBulkOp();
        }
    }
);

if ( counter > 1 )
    bulk.execute();

最佳是针对使用大批量操作API的现代API的bulkWrite()完成的(技术上一切现在都可以使用),但实际上是安全回归的方式与较旧版本的MongoDB.虽然老实说,这意味着要在MongoDB 2.6之前,并且使用这种版本的官方支持选项将远远超出您的范围.编码在此方面更为简洁:

Best done with bulkWrite() for modern API's which uses the Bulk Operations API ( technically everything does now ) but actually in a way that is safely regressive with older versions of MongoDB. Though in all honesty that would mean prior to MongoDB 2.6 and you would be well out of coverage for official support options using such a version. The coding is somewhat cleaner for this:

var batch = [];
db.collection.find({ "category": /^\s+|\s+$/ },{ "category": 1}).forEach(
  function(doc) {
    batch.push({
      "updateOne": {
        "filter": { "_id": doc._id },
        "update": { "$set": { "category": doc.category.trim() } }
      }
    });

    if ( batch.legth % 1000 == 0 ) {
      db.collection.bulkWrite(batch);
      batch = [];
    }
  }
);

if ( batch.length > 0 ) {
  db.collection.bulkWrite(batch);
  batch = [];
}

所有这些操作仅每1000个文档一次将操作发送到服务器,或者在64MB BSON限制内进行尽可能多的修改.

Which all only send operations to the server once per 1000 documents, or as many modifications as you can fit under the 64MB BSON limit.

作为解决问题的几种方法.或在导入前先更新CSV文件.

As just a few ways to approach the problem. Or update your CSV file first before importing.

这篇关于从字符串值中删除空格(前导和尾随)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆