在MongoDB上执行聚合/设置交集 [英] Perform Aggregation/Set intersection on MongoDB

查看:235
本文介绍了在MongoDB上执行聚合/设置交集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个查询,对示例数据集进行一些汇总后,将以下示例视为中间数据;

I have a query, consider the following example as a intermediate data after performing some aggregation on a sample dataset;

fileid字段包含文件的ID,而user数组包含对相应文件进行了一些更改的用户数组

fileid field contains the id of a file, and the user array containing array of users, who made some changes to the respective file

{
   "_id" : {  "fileid" : 12  },
   "_user" : [ "a","b","c","d" ]
}
{
   "_id" : {  "fileid" : 13  },
   "_user" : [ "f","e","a","b" ]
}
{
   "_id" : {  "fileid" : 14  },
   "_user" : [ "g","h","m","n" ]
}
{
   "_id" : {  "fileid" : 15  },
   "_user" : [ "o","r","s","v" ]
}
{
   "_id" : {  "fileid" : 16  },
   "_user" : [ "x","y","z","a" ]
}
{
   "_id" : {  "fileid" : 17  },
   "_user" : [ "g","r","s","n" ]
}

我需要为此找到解决方案->任何两个做过至少一些相同文件的更改的用户.所以输出结果应该是

I need to find solution for this -> any two users that did some changes to atleast two of the same file. So the output-result should be

{
   "_id" : {  "fileid" : [12,13]  },
   "_user" : [ "a","b"]
}
{
   "_id" : {  "fileid" : [14,17]  },
   "_user" : [ "g","n" ]
}
{
   "_id" : {  "fileid" : [15,17]  },
   "_user" : [ "r","s" ]
}

我们非常感谢您的投入.

Your inputs are highly appreciated.

推荐答案

这是一个有点复杂的解决方案.想法是首先使用数据库来获取可能的对的填充,然后转过来要求数据库在_user字段中找到对.请注意,成千上万的用户将创建一个相当大的配对列表.我们使用$addFields只是为了防止输入记录比示例中看到的更多,但如果没有,则为了提高效率,请使用$project替换以减少流经管道的材料量.

This is a somewhat involved solution. The idea is to first use the DB to get the population of possible pairs, then turn around and ask the DB to find the pairs in the _user field. Beware that 1000s of users will create a pretty darn big pairing list. We use $addFields just in case there's more to the input records than we see in the example, but if not, for efficiency replace with $project to cut down the amount of material flowing through the pipe.

//
// Stage 1:  Get unique set of username pairs.
//
c=db.foo.aggregate([
{$unwind: "$_user"}

// Create single deduped list of users:
,{$group: {_id:null, u: {$addToSet: "$_user"} }}

// Nice little double map here creates the pairs, effectively doing this:
//    for index in range(0, len(list)):
//      first = list[index]
//      for p2 in range(index+1, len(list)):
//        pairs.append([first,list[p2]])
// 
,{$addFields: {u: 
  {$map: {
    input: {$range:[0,{$size:"$u"}]},
    as: "z",
    in: {
        $map: {
            input: {$range:[{$add:[1,"$$z"]},{$size:"$u"}]},
            as: "z2",
            in: [
            {$arrayElemAt:["$u","$$z"]},
            {$arrayElemAt:["$u","$$z2"]}
            ]
        }
    }
    }}
}}

// Turn the array of array of pairs in to a nice single array of pairs:
,{$addFields: {u: {$reduce:{
        input: "$u",
        initialValue:[],
        in:{$concatArrays: [ "$$value", "$$this"]}
        }}
    }}
          ]);


// Stage 2:  Find pairs and tally up the fileids

doc = c.next(); // Get single output from Stage 1 above.                       

u = doc['u'];

c2=db.foo.aggregate([
{$addFields: {_x: {$map: {
                input: u,
                as: "z",
                in: {
                    n: "$$z",
                    q: {$setIsSubset: [ "$$z", "$_user" ]}
                }
            }
        }
    }}
,{$unwind: "$_x"}
,{$match: {"_x.q": true}}
//  Nice use of grouping by an ARRAY here:
,{$group: {_id: "$_x.n", v: {$push: "$_id.fileid"}, n: {$sum:1} }}
,{$match: {"n": {"$gt":1}}}
                     ]);

show(c2);

这篇关于在MongoDB上执行聚合/设置交集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆