导出在 mongodb 中存储为 Bindata 的文本 [英] Export text stored as Bindata in mongodb

查看:128
本文介绍了导出在 mongodb 中存储为 Bindata 的文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个相当大的数据集合,由第三方制作,它包含一个 Bindata包"列,其中实际存储了纯 ASCII.我的意思是,他们可以将文本存储为字符串.

I have a quite big collection of data, made by third parties, it contains a Bindata "package" column where plain ASCII is actually stored. I mean, they could have stored the text as string.

我必须以 csv 格式导出此集合,这与 mongoexport 配合得很好,但输出包含包"列的 base64 编码值.我需要该列中的实际文本,而不是 BinData(0,\"NDYuN .....==").

I have to export this collection in csv format, which works quite well with mongoexport, but the output contains the base64 encoded values for the "package" column. I need the actual text from that column, not BinData(0,\"NDYuN.....==").

到目前为止,我尝试使用新列rawData"更新集合,如下所示:

What I've tried so far is update the collection with a new column "rawData", like this:

db.segments.find({"_id" : ObjectId("4fc79525f65181293930070b")}).forEach(function(data) {
  db.segments.update(
     {_id:data._id},
     {$set:{ "rawData" : data.package.toString() }}
  );
});

在我做对之前,我已将查找范围限制为一个文档.不幸的是 toString 并没有达到我期望的魔力.

I've limited the find to just one document until I get it right. Unfortunately toString does not do the magic I expect.

另外,我试过这个:

db.segments.find({"_id" : ObjectId("4fc79525f65181293930070b")}).forEach(function(data){
  data.package = new String(data.package);
  db.segments.save(data); 
});

结果更糟.

如果我用 php 读取文档,$response = $db->e​​xecute('return db.segments.findOne()'); 然后 print_r($response) ,我可以验证数据是否正确存储为 base64.

If I read the document with php, $response = $db->execute('return db.segments.findOne()'); then print_r($response) , I can validate that the data is properly stored, as base64.

我在任何地方都找不到解决方案,也许是因为没有人需要做这样愚蠢的事情.

I couldn't find a solution anywhere, perhaps because nobody ever needed to do something as stupid as this.

推荐答案

JavaScript BinData 对象具有 base64()hex() 方法,您可以使用它们接收相应格式的字符串.至于在 JS 中将这些值解码为文本字符串,您有一些选择:

The JavaScript BinData object has base64() and hex() methods, which you can use to receive a string in the respective format. As for decoding those values into text strings within JS, you have some options:

根据 BSON 规范,二进制数据字段由 32 位整数长度组成,子类型和字节数组.JS 中返回的 base64 和十六进制表示包含以整数长度为前缀的字节数组.这意味着在解码这些值后,我们需要去除前四个字节.下面是使用这两个选项的示例:

According to the BSON spec, binary data fields are composed of a 32-bit integer length, subtype, and byte array. The base64 and hexadecimal representations returned in JS contain the byte array prepended with the integer length. That means after decoding the values, we'll want to strip off the first four bytes. Here's an example using both options:

// https://stackoverflow.com/a/3058974/162228
function decode_base64(s) {
    var e={},i,k,v=[],r='',w=String.fromCharCode,u=0;
    var n=[[65,91],[97,123],[48,58],[43,44],[47,48]];

    for(z in n){for(i=n[z][0];i<n[z][1];i++){v.push(w(i));}}
    for(i=0;i<64;i++){e[v[i]]=i;}
    function a(c){
      if(c<128)r+=w(c);else if(c>=192)u=c;else r+=w(((u&31)<<6)+(c&63));
    }

    for(i=0;i<s.length;i+=72){
        var b=0,c,x,l=0,o=s.substring(i,i+72);
        for(x=0;x<o.length;x++){
            c=e[o.charAt(x)];b=(b<<6)+c;l+=6;
            while(l>=8)a((b>>>(l-=8))%256);
        }
    }
    return r;
}

// https://stackoverflow.com/a/3745677/162228
function hex2a(hex) {
    var str = '';
    for (var i = 0; i < hex.length; i += 2)
        str += String.fromCharCode(parseInt(hex.substr(i, 2), 16));
    return str;
}

db.segments.find().forEach(function(doc){
    print(decode_base64(doc.package.base64()));
});

db.segments.find().forEach(function(doc){
    print(hex2a(doc.package.hex()));
});

这是我用来插入一些夹具数据的一个小的 PHP 脚本:

And here's a small PHP script I used to insert some fixture data:

<?php

$mongo = new Mongo();
$c = $mongo->selectCollection('test', 'segments');
$c->drop();

$c->save(['package' => new MongoBinData('foo')]);
$c->save(['package' => new MongoBinData('bar')]);

foreach ($c->find() as $doc) {
    printf("%s\n", $doc['bindata']->bin);
}

根据数据集的大小,在 PHP 中进行二进制字段转换可能更合理.如果你确实想使用 JavaScript,我绝对建议通过 shell 客户端而不是 db.eval() 执行脚本,这样你就不会用长时间运行的 JS 锁定数据库功能.

Depending on the size of your data set, it's likely more reasonable to do the binary field conversion in PHP. If you do want to utilize JavaScript, I'd definitely suggest executing the script via the shell client instead of db.eval(), so you don't lock up the database with a long-running JS function.

这篇关于导出在 mongodb 中存储为 Bindata 的文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆