MongoDB中的实时聚合策略 [英] Strategies for Real-Time Aggregations in MongoDB

查看:121
本文介绍了MongoDB中的实时聚合策略的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在探索使用MongoDB进行实时分析的方法时,似乎存在一种相当标准的求和方法,但对于更复杂的聚合而言却一无所获.一些有用的东西...

In exploring ways to do real-time analytics with MongoDB, there seems to be a fairly standard way to do sums, but nothing in terms of more complex aggregation. Some things that have helped...

  • Twitter's Rainbird: Realtime sums, incrementing counters on keys hierarchically. Cassandra.
  • Yahoo's S4 and source: Not sure exactly how this works yet, but it looks like it's real-time map-reduce. So basically, for every record that's added, you pass it to a mapper, it converts it to a hash, and that sends it to be integrated into the report document.
  • http://www.slideshare.net/dacort/mongodb-realtime-data-collection-and-stats-generation
  • Hummingbird

求和的基本方法是自动为每个新记录增加文档关键字,以缓存常见查询:

The basic approach for doing sums is to atomically increment document keys for each new record that comes in, to cache common queries:

Stats.collection.update({"keys" => ["a", "b", "c"]}, {"$inc" => {"counter_1" => 1, "counter_2" => 1"}, "upsert" => true);

这不适用于总计以外的总计.我的问题是,是否可以对mongodb中的平均值最小最大执行类似的操作?

This doesn't work for aggregates other than sums though. My question is, can something like this be done for averages, min, and max in mongodb?

假设您有一个像这样的文件:

Say you have a document like this:

{
  :date => "04/27/2011",
  :page_views => 1000,
  :user_birthdays => ["12/10/1980", "6/22/1971", ...] # 1000 total
}

您能做一些原子的或优化的/实时的操作来将生日分成这样吗?

Could you do some atomic or optimized/real-time operation that grouped the birthdays into something like this?

{
  :date => "04/27/2011",
  :page_views => 1000,
  :user_birthdays => ["12/10/1980", "6/22/1971", ...], # 1000 total
  :average_age => 27.8,
  :age_rank => {
    "0 to 20" => 180,
    "20 to 30" => 720,
    "30 to 40" => 100,
    "40 to 50" => 0
  }
}

...就像您可以执行Doc.collection.update({x => 1}, {"$push" => {"user_birthdays" => "12/10/1980"}})向数组中添加某些内容,而不必加载文档一样,您可以执行类似的操作来平均/聚合数组吗?您是否遵循这些思路进行实时汇总?

...just like you can do Doc.collection.update({x => 1}, {"$push" => {"user_birthdays" => "12/10/1980"}}) to add something to an array, and not have to load the document in, can you do something like that to average/aggregate the array? Is there something along these lines that you use for real-time aggregation?

MapReduce用于在批处理作业中执行此操作,我正在寻找用于诸如以下内容的实时map-reduce的模式:

MapReduce is used to do this in batch-processing jobs, I'm looking for patterns for something like real-time map-reduce for:

  1. 平均值:每次将新项目推送到mongodb中的数组时,实时平均这些值的最佳方法是什么?
  2. 分组:如果您将年龄分组为10年括号,并且有一个年龄数组,那么当您使用新年龄更新文档时,如何最佳地更新每个组的计数?说年龄数组将被不断推/拉.
  3. 最小/最大:有什么方法可以计算和存储该年龄数组在该文档中的最小/最大?
  1. Averages: every time you push a new item to an array in mongodb, what's the best way to average those values in real-time?
  2. Grouping: if you group age for 10-year brackets, and you have an ages array, how could you optimally update the count for each group as you're updating the document with the new age? say the ages array will be constantly pushed/pulled.
  3. Min/Max: what are some ways to compute and store the min/max of that ages array in that document?

推荐答案

您能做一些原子的或优化的/实时的操作来将生日分成这样吗?

Could you do some atomic or optimized/real-time operation that grouped the birthdays into something like this?

您似乎已经添加了两个字段age_rankaverage_age.这些是根据您已有的数据有效计算得出的字段.如果我给您提供了具有页面浏览量和用户生日的文档,那么对于客户端代码而言,查找最小值/最大值,平均值等确实不那么容易.

It looks like you've added two fields age_rank, average_age. These are effectively calculated fields based on the data you already have. If I gave you the document with page views and user birthdays, it should be really trivial for the client code to find min/max, average, etc.

在我看来,您正在要求MongoDB在服务器端执行聚合.但是您添加了不想使用Map/Reduce的限制吗?

It seems to me that you're asking for MongoDB to perform the aggregation for you server-side. But you're adding the limitation that you don't want to use Map/Reduce?

如果我正确理解了您的问题,那么您正在寻找可以在其中说出将此项目添加到数组中并让所有相关项目自行更新"的内容?您不希望读者执行任何逻辑,而是希望一切都在服务器端神奇地"发生.

If I'm understanding your question correctly, you're looking for something where you can say "add this item to an array and have all dependent items update themselves"? You don't want readers to perform any logic, you want everything to happen "magically" on the server side.

因此,有三种方法可以解决此问题,但目前只有其中一种:

So there are three different ways to tackle this, but only one of them is currently available:

  1. 写此逻辑客户端.听起来不像您想要的解决方案,但是它可以工作.如果您具有基础数据,那么在大多数语言中,执行max/min/med/avg应该是微不足道的.
  2. 利用聚合即将推出的功能.直到1.9.x才安排这些.改进的聚合将允许提取您要查找的数据,但是,您仍然必须编写适当的查询.基础数据库仍然不包含您要查找的数据.
  3. 您需要触发.如果您确实希望数据库始终保持一致并包含汇总数据,那么这就是您所需要的.但是,触发器功能尚不存在.
  1. Write this logic client-side. It doesn't sound like the solution you want, but it will work. If you have the underlying data, doing a max/min/med/avg should be pretty trivial in most languages.
  2. Leverage the upcoming features for Aggregation. These are not scheduled until 1.9.x. Improved aggregation will allow to extract the data you're looking for, however, you'll still have to write the appropriate queries. The underlying DB still does not contain the data you're looking for.
  3. You need triggers. If you really want the DB to always consistent and contain summarized data, then this is what you need. However, the triggers feature does not yet exist.

不幸的是,您目前唯一的选择是#1.幸运的是,我知道有几个人成功使用了选项#1.

Unfortunately, your only option right now is #1. Fortunately, I know of several people that are using option #1 successfully.

这篇关于MongoDB中的实时聚合策略的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆