如何使用mongodb在特定字段中选择多个记录? [英] How can I select a number of records per a specific field using mongodb?

查看:62
本文介绍了如何使用mongodb在特定字段中选择多个记录?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在mongodb中有一个文档集合,每个文档都有一个组"字段,该字段引用拥有该文档的组.这些文档如下所示:

I have a collection of documents in mongodb, each of which have a "group" field that refers to a group that owns the document. The documents look like this:

{
  group: <objectID>
  name: <string>
  contents: <string>
  date: <Date>
}

我想构造一个查询,该查询返回每个组的最新N个文档.例如,假设有5个组,每个组有20个文档.我想编写一个查询,该查询将返回每个组的前3名,这将返回15个文档,每个组中的3个.每个组获得3,即使另一个组中的第4组是最近的.

I'd like to construct a query which returns the most recent N documents for each group. For example, suppose there are 5 groups, each of which have 20 documents. I want to write a query which will return the top 3 for each group, which would return 15 documents, 3 from each group. Each group gets 3, even if another group has a 4th that's more recent.

在SQL世界中,我相信这种类型的查询是通过分区依据"和一个计数器完成的.在mongodb中是否有这样的事情,除了对N个组进行N + 1个单独的查询之外?

In the SQL world, I believe this type of query is done with "partition by" and a counter. Is there such a thing in mongodb, short of doing N+1 separate queries for N groups?

推荐答案

您尚不能使用聚合框架来执行此操作-您可以获取每个组的$ max或最高日期值,但是聚合框架尚无法解决累积前N个,再加上没有办法将整个文档推入结果集中(仅单个字段).

You cannot do this using the aggregation framework yet - you can get the $max or top date value for each group but aggregation framework does not yet have a way to accumulate top N plus there is no way to push the entire document into the result set (only individual fields).

因此,您必须依靠MapReduce.这是可行的,但是我确定有很多变体(所有变体都需要基于特定属性对对象数组进行排序,我从

So you have to fall back on MapReduce. Here is something that would work, but I'm sure there are many variants (all require somehow sorting an array of objects based on a specific attribute, I borrowed my solution from one of the answers in this question.

地图功能-将组名作为键输出,将文档的其余部分作为值输出-但会将其作为包含数组的文档输出,因为我们将尝试累积每个组的结果数组:

Map function - outputs group name as a key and the entire rest of the document as the value - but it outputs it as a document containing an array because we will try to accumulate an array of results per group:

map = function () { 
    emit(this.name, {a:[this]}); 
}

reduce函数会将属于同一组的所有文档累积到一个数组中(通过concat).请注意,如果您通过检查日期来优化reduce以仅保留前五个数组元素,则不需要finalize函数,并且在运行mapreduce时将使用较少的内存(它也会更快).

The reduce function will accumulate all the documents belonging to the same group into one array (via concat). Note that if you optimize reduce to keep only the top five array elements by checking date then you won't need the finalize function, and you will use less memory during running mapreduce (it will also be faster).

reduce = function (key, values) {
    result={a:[]};
    values.forEach( function(v) {
        result.a = v.a.concat(result.a);
    } );
    return result;
}

由于我保留了每个键的所有值,因此我需要一个finalize函数来仅提取每个键的最新五个元素.

Since I'm keeping all values for each key, I need a finalize function to pull out only latest five elements per key.

final = function (key, value) {
      Array.prototype.sortByProp = function(p){
       return this.sort(function(a,b){
       return (a[p] < b[p]) ? 1 : (a[p] > b[p]) ? -1 : 0;
      });
    }

    value.a.sortByProp('date');
    return value.a.slice(0,5);
}

使用与您提供的模板文档类似的模板文档,可以通过调用mapReduce命令来运行此模板文档:

Using a template document similar to one you provided, you run this by calling mapReduce command:

> db.top5.mapReduce(map, reduce, {finalize:final, out:{inline:1}})
{
    "results" : [
        {
            "_id" : "group1",
            "value" : [
                {
                    "_id" : ObjectId("516f011fbfd3e39f184cfe13"),
                    "name" : "group1",
                    "date" : ISODate("2013-04-17T20:07:59.498Z"),
                    "contents" : 0.23778377776034176
                },
                {
                    "_id" : ObjectId("516f011fbfd3e39f184cfe0e"),
                    "name" : "group1",
                    "date" : ISODate("2013-04-17T20:07:59.467Z"),
                    "contents" : 0.4434165076818317
                },
                {
                    "_id" : ObjectId("516f011fbfd3e39f184cfe09"),
                    "name" : "group1",
                    "date" : ISODate("2013-04-17T20:07:59.436Z"),
                    "contents" : 0.5935856597498059
                },
                {
                    "_id" : ObjectId("516f011fbfd3e39f184cfe04"),
                    "name" : "group1",
                    "date" : ISODate("2013-04-17T20:07:59.405Z"),
                    "contents" : 0.3912118375301361
                },
                {
                    "_id" : ObjectId("516f011fbfd3e39f184cfdff"),
                    "name" : "group1",
                    "date" : ISODate("2013-04-17T20:07:59.372Z"),
                    "contents" : 0.221651989268139
                }
            ]
        },
        {
            "_id" : "group2",
            "value" : [
                {
                    "_id" : ObjectId("516f011fbfd3e39f184cfe14"),
                    "name" : "group2",
                    "date" : ISODate("2013-04-17T20:07:59.504Z"),
                    "contents" : 0.019611883210018277
                },
                {
                    "_id" : ObjectId("516f011fbfd3e39f184cfe0f"),
                    "name" : "group2",
                    "date" : ISODate("2013-04-17T20:07:59.473Z"),
                    "contents" : 0.5670706110540777
                },
                {
                    "_id" : ObjectId("516f011fbfd3e39f184cfe0a"),
                    "name" : "group2",
                    "date" : ISODate("2013-04-17T20:07:59.442Z"),
                    "contents" : 0.893193120136857
                },
                {
                    "_id" : ObjectId("516f011fbfd3e39f184cfe05"),
                    "name" : "group2",
                    "date" : ISODate("2013-04-17T20:07:59.411Z"),
                    "contents" : 0.9496864483226091
                },
                {
                    "_id" : ObjectId("516f011fbfd3e39f184cfe00"),
                    "name" : "group2",
                    "date" : ISODate("2013-04-17T20:07:59.378Z"),
                    "contents" : 0.013748752186074853
                }
            ]
        },
        {
            "_id" : "group3",
                        ...
                }
            ]
        }
    ],
    "timeMillis" : 15,
    "counts" : {
        "input" : 80,
        "emit" : 80,
        "reduce" : 5,
        "output" : 5
    },
    "ok" : 1,
}

每个结果都有_id作为组名,值作为该组名中集合中最新的五个文档的数组.

Each result has _id as group name and values as array of most recent five documents from the collection for that group name.

这篇关于如何使用mongodb在特定字段中选择多个记录?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆