子文档MongoDB中的键的不同值(1亿条记录) [英] Distinct values of a key in a sub-document MongoDB (100 million records)

查看:77
本文介绍了子文档MongoDB中的键的不同值(1亿条记录)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的样本"集合中有1亿条记录.我想拥有另一个集合,其中包含所有不同的用户名"user.screen_name"

I have 100 million records in my "sample" collection. I want to have another collection with all of the distinct user names "user.screen_name"

我的mongodb数据库样本"集合中具有以下结构:

I have the following structure in my mongodb database "sample" collection:

{
"_id" : ObjectId("515af34297c2f607b822a54b"),
"text" : "random text goes here",
"user" :
  {
    "id" : 972863366,
    "screen_name" : "xname",
    "verified" : false,
    "time_zone" : "Amsterdam",
   }
}

当我尝试使用"distinct('user.id).length"之类的东西时,出现以下错误:

When I try things like "distinct('user.id).length" I get the following error:

    "errmsg" : "exception: distinct too big, 16mb cap",

我需要一种有效的方式来在我的样本"集合中仅拥有{"user_name":"name"}不同用户的另一个集合.这样我就可以查询这个新数据库的大小,并获得不同用户的数量. (并在以后进行进一步分析)

I need an efficient way to have another collection with only {"user_name": "name"} of distinct users in my "sample" collection. so then I can query the size of this new database and get the number of distinct users. (and for further analysis in the future)

推荐答案

我尝试了找到的解决方案

I tried the solution I found here and it worked fine :) .. I'll keep the thread and add my code in case someone needs it.

var SOURCE = db.sample;
var DEST = db.distinct;
DEST.drop();
map = function() {
  emit( this.user.screen_name , {count: 1});
}

reduce = function(key, values) {
  var count = 0;

  values.forEach(function(v) {
    count += v['count'];   
  });

  return {count: count};
};

res = SOURCE.mapReduce( map, reduce, 
    { out: 'distinct', 
     verbose: true
    }
    );

print( "distinct count= " + res.counts.output );
print( "distinct count=", DEST.count() );

致谢

这篇关于子文档MongoDB中的键的不同值(1亿条记录)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆