适用于16mb以上大小的文档的MongoDB解决方法? [英] MongoDB workaround for document above 16mb size?

查看:289
本文介绍了适用于16mb以上大小的文档的MongoDB解决方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理的MongoDB集合从手机中获取传感器数据,并像每2-6秒一次将其ping通到服务器.

The collection of MongoDB I am working on takes sensor data from cellphone and it is pinged to the server like every 2-6 seconds.

数据巨大,并且4-5小时后越过了16mb的限制,似乎没有任何解决方法?

The data is huge and the limit of 16mb is crossed after 4-5 hours, there don't seem to be any work around for this?

我尝试在Stack Overflow上进行搜索,并遇到了各种各样的问题,但实际上没有人分享他们的黑客技巧.

I have tried searching for it on Stack Overflow and went through various questions but no one actually shared their hack.

有什么办法吗……在数据库方面,也许会像通过gridFS对大文件那样分配大块?

Is there any way... on the DB side maybe which will distribute the chunk like it is done for big files via gridFS?

推荐答案

要解决此问题,您将需要对数据结构进行一些小的修改.听起来,要使文档超过16mb的限制,您必须将传感器数据嵌入到单个文档中的数组中.

To fix this problem you will need to make some small amendments to your data structure. By the sounds of it, for your documents to exceed the 16mb limit, you must be embedding your sensor data into an array in a single document.

我不建议在这里使用GridFS,我不认为这是最好的解决方案,这就是原因.

I would not suggest using GridFS here, I do not believe it to be the best solution, and here is why.

您可以采用一种称为存储的技术,该技术将从本质上将传感器读数分成单独的文档,从而为您解决此问题.

There is a technique known as bucketing that you could employ which will essentially split your sensor readings out into separate documents, solving this problem for you.

它的工作方式是这样的:

The way it works is this:

让我们说我有一个文档,其中嵌入了特定传感器的读数,如下所示:

Lets say I have a document with some embedded readings for a particular sensor that looks like this:

{
    _id : ObjectId("xxx"),
    sensor : "SensorName1",
    readings : [
        { date : ISODate("..."), reading : "xxx" },
        { date : ISODate("..."), reading : "xxx" },
        { date : ISODate("..."), reading : "xxx" }
    ]
}

使用上面的结构,已经存在一个主要缺陷,读数数组可能呈指数增长,并且超出了16mb的文档限制.

With the structure above, there is already a major flaw, the readings array could grow exponentially, and exceed the 16mb document limit.

所以我们可以做的就是稍微改变结构,使其看起来像这样,包括一个count属性:

So what we can do is change the structure slightly to look like this, to include a count property:

{
    _id : ObjectId("xxx"),
    sensor : "SensorName1",
    readings : [
        { date : ISODate("..."), reading : "xxx" },
        { date : ISODate("..."), reading : "xxx" },
        { date : ISODate("..."), reading : "xxx" }
    ],
    count : 3
}

其背后的想法是,当您将读数$ push到嵌入式数组中时,每次执行的推入操作都会增加($ inc)计数变量.并且当您执行此更新(推送)操作时,您将在此"count"属性上包含一个过滤器,该过滤器可能看起来像这样:

The idea behind this is, when you $push your reading into your embedded array, you increment ($inc) the count variable for every push that is performed. And when you perform this update (push) operation, you would include a filter on this "count" property, which might look something like this:

{ count : { $lt : 500} }

然后,设置更新选项,以便将"upsert"设置为"true":

Then, set your Update Options so that you can set "upsert" to "true":

db.sensorReadings.update(
    { name: "SensorName1", count { $lt : 500} },
    {
        //Your update. $push your reading and $inc your count
        $push: { readings: [ReadingDocumentToPush] }, 
        $inc: { count: 1 }
    },
    { upsert: true }
)

有关MongoDb更新和Upsert选项的更多信息,请参见此处:

see here for more info on MongoDb Update and the Upsert option:

MongoDB更新文档

将发生的情况是,当不满足过滤条件时(即,该传感器不存在现有文档,或者计数大于或等于500,因为每次推入一个项目时都将其递增) ,将创建一个新文档,并且读数现在将被嵌入该新文档中.因此,如果操作正确,您将永远不会达到16mb的限制.

What will happen is, when the filter condition is not met (i.e when there is either no existing document for this sensor, or the count is greater or equal to 500 - because you are incrementing it every time an item is pushed), a new document will be created, and the readings will now be embedded in this new document. So you will never hit the 16mb limit if you do this properly.

现在,当在数据库中查询特定传感器的读数时,您可以取回该传感器的多个文档(而不是仅包含其中所有读数的一个文档),例如,如果您有10,000个读数,您将得到返回20个文档,每个文档都有500个读数.

Now, when querying the database for readings of a particular sensor, you may get back multiple documents for that sensor (instead of just one with all the readings in it), for example, if you have 10,000 readings, you will get 20 documents back, each with 500 readings each.

然后,您可以使用聚合管道和$ unwind来过滤读数,就好像它们是自己的单独文档一样.

You can then use aggregation pipeline and $unwind to filter your readings as if they were their own individual documents.

有关展开的更多信息,请参见此处,它非常有用

For more information on unwind see here, it's very useful

MongoDB放松

我希望这会有所帮助.

这篇关于适用于16mb以上大小的文档的MongoDB解决方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆