在嵌套数组上使用$ unwind的后果? [英] Consequences of using $unwind on nested arrays?

查看:299
本文介绍了在嵌套数组上使用$ unwind的后果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我有17,000个文档,其结构与以下文档类似:​​

Say I have 17,000 documents that have a structure similar to the document below:

{
   someInfo: "blah blah blah",
   // and another dozen or so attributes here, followed by:
   answers:[
      {
          email: "test@test.com,
          values:[
             {value: 1, label: "test1"},
             {value: 2, label: "test2"}    
          ]
      },
      {
          email: "someone@somewhere.com,
          values:[
             {value: 6, label: "test1"},
             {value: 1, label: "test2"}    
          ]
      }
   ]
}

说我像这样使用聚合来展开answersanswers.values:

Say I use aggregate to unwind both answers and answers.values like so:

db.participants.aggregate(
   {$unwind: "$answers"},
   {$unwind: "$answers.values"}
);

我认为它会在内存中创建相当大的结果集,因为它实际上将复制父对象17,000 * # of answers * # of values次.

I assume it would create a fairly large result set in memory since it would essentially be replicating the parent object 17,000 * # of answers * # of values times.

我一直在测试一个在开发环境上执行类似操作的查询,并且查询本身的性能很好,但是我想知道我是否应该担心在可以解开结果集的生产环境中运行此查询可能会消耗大量内存. Mongo在 $ unwind 上的文档中介绍了它的工作原理,但没有介绍讨论潜在的性能问题.

I have been testing a query that does something similar on a development environment and the performance of the query itself is fine, but I'm wondering if I should be concerned about running this on a production environment where the unwound result set could potentially eat up a lot of memory. Mongo's documentation on $unwind goes into how it works, but does not discuss potential performance problems.

我应该担心在生产系统上执行此操作吗?会减慢对数据库的其他查询吗?

Should I be worried about doing this on a production system? Will it slow down other queries against the db?

推荐答案

It is always a good idea to be cognizant of memory resources when $unwinding because of the replication of data that occurs.

使用 $match 将结果缩小到您要查找的特定文档当然是减少保存返回的数据所需的内存量的一种方法.

Using $match to narrow down the results to the specific documents you are looking for is of course one way to reduce the amount of memory necessary to hold the returned data.

另一种减少内存占用的方法是使用 $project . $project允许您重新组织管道中的文档,以便仅返回您感兴趣的元素.

Another way to reduce the memory footprint is with $project. $project allows you to re-organize the documents in the pipeline so that you only return the elements in which you are interested.

要使用您的示例,

{
  someInfo: "blah blah blah",
  answers: [
    {
      email: "test@test.com",
      values: [
        {value: 1, label: "test1"},
        {value: 2, label: "test2"}    
      ]
    },
    {
      email: "someone@somewhere.com",
      values: [
        {value: 6, label: "test1"},
        {value: 1, label: "test2"}    
      ]
    }
  ]
}

使用

db.collection.aggregate([{ $match: { <element>: <value> }}, { $project: { _id: 0, answers: 1}}])

将删除您可能不感兴趣的someInfo和其他属性.然后您可以在展开后再次$project ...

will remove the someInfo and other attributes you may not be interested in. Then you could $project again after unwinding...

db.collection.aggregate([
   { $match: { <element>: <value> }},
   { $project: { _id: 0, answers: 1}},
   { $unwind: "$answers"},
   { $unwind: "$answers.tags"},
   { $project: { e: "$answers.email", v: "$answers.values"}}
])

将返回相当紧凑的结果,例如:

will return fairly compact results like:

{ e: "test@test.com", v: { value: 1, label: "test1" } }
{ e: "test@test.com", v: { value: 2, label: "test2" } }
{ e: "someone@somewhere.com", v: { value: 6, label: "test1" } }
{ e: "someone@somewhere.com", v: { value: 1, label: "test2" } }

尽管单字母属性名称降低了人们的可读性,但确实减少了因冗长的重复属性名称而膨胀的数据量.

Although the single letter attribute names reduce human-readability, it does cut down on the size of the data that is inflated by lengthy repeated attribute names.

这篇关于在嵌套数组上使用$ unwind的后果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆