MongoDB映射/减少多个集合? [英] MongoDB map/reduce over multiple collections?

查看:85
本文介绍了MongoDB映射/减少多个集合?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,是背景.我曾经有一个集合logs,并使用map/reduce生成了各种报告.这些报告大多数都是基于一天之内的数据,因此我总是遇到条件d: SOME_DATE.当logs集合变得非常大时,即使删除了很多索引之后,插入也变得非常慢(比我们正在监视的应用生成日志的速度要慢).因此,我们决定将每天的数据保存在单独的集合中-logs_YYYY-mm-dd-这样索引就更小了,我们甚至不需要日期索引.这很酷,因为大多数报告(因此是映射/归约)都在每日数据上.但是,我们有一份报告,需要涵盖多天.

First, the background. I used to have a collection logs and used map/reduce to generate various reports. Most of these reports were based on data from within a single day, so I always had a condition d: SOME_DATE. When the logs collection grew extremely big, inserting became extremely slow (slower than the app we were monitoring was generating logs), even after dropping lots of indexes. So we decided to have each day's data in a separate collection - logs_YYYY-mm-dd - that way indexes are smaller, and we don't even need an index on date. This is cool since most reports (thus map/reduce) are on daily data. However, we have a report where we need to cover multiple days.

现在是问题.有没有办法在多个集合上运行地图/缩小(或更准确地说,是地图),就像它只是一个一样?

And now the question. Is there a way to run a map/reduce (or more precisely, the map) over multiple collections as if it were only one?

推荐答案

reduce函数可以被调用一次,带有一个键和所有对应的值(但前提是该键有多个值) -如果密钥只有1个值,则根本不会调用它.

A reduce function may be called once, with a key and all corresponding values (but only if there are multiple values for the key - it won't be called at all if there's only 1 value for the key).

它也可能被多次调用,每次都带有一个键,并且只有一个对应值的子集,并且先前的还原结果是该键的.这种情况称为重新减少.为了支持重新缩减,您的缩减功能应为幂等.

It may also be called multiple times, each time with a key and only a subset of the corresponding values, and the previous reduce results for that key. This scenario is called a re-reduce. In order to support re-reduces, your reduce function should be idempotent.

幂等归约函数有两个关键特征:

There are two key features in a idempotent reduce function:

  • reduce函数的返回值应该与它所接受的值格式相同.因此,如果reduce函数接受字符串数组,则函数应该返回一个字符串.如果它接受具有多个属性的对象,则应返回一个包含相同属性的对象.这样可以确保在使用先前reduce的结果调用该函数时,函数不会中断.
  • 不要基于输入的值的数量进行假设.不能保证values参数包含给定值的 all 钥匙.因此,在计算中使用values.length风险很大,应避免使用.
  • The return value of the reduce function should be in the same format as the values it takes in. So, if your reduce function accepts an array of strings, the function should return a string. If it accepts objects with several properties, it should return an object containing those same properties. This ensures that the function doesn't break when it is called with the result of a previous reduce.
  • Don't make assumptions based on the number of values it takes in. It isn't guaranteed that the values parameter contains all the values for the given key. So using values.length in calculations is very risky and should be avoided.

更新:在最新的MongoDB版本中,不需要以下两个步骤(甚至可能,我没有检查过).现在,如果您在map-reduce 选项:

Update: The two steps below aren't required (or even possible, I haven't checked) on the more recent MongoDB releases. It can now handle these steps for you, if you specify an output collection in the map-reduce options:

{ out: { reduce: "tempResult" } }


如果reduce函数是幂等的,则在减少地图多个集合方面应该没有任何问题.只需重新减少每个集合的结果即可:


If your reduce function is idempotent, you shouldn't have any problems map-reducing multiple collections. Just re-reduce the results of each collection:

对每个必需的集合运行map-reduce并将结果保存在单个临时集合中.您可以使用完成函数:

Run the map-reduce on each required collection and save the results in a single, temporary collection. You can store the results using a finalize function:

finalize = function (key, value) {
  db.tempResult.save({ _id: key, value: value });
}

db.someCollection.mapReduce(map, reduce, { finalize: finalize })
db.anotherCollection.mapReduce(map, reduce, { finalize: finalize })

步骤2

使用相同的reduce函数在临时集合上运行另一个map-reduce . map函数是一个简单的函数,可以从临时集合中选择键和值:

Step 2

Run another map-reduce on the temporary collection, using the same reduce function. The map function is a simple function that selects the keys and values from the temporary collection:

map = function () {
  emit(this._id, this.value);
}

db.tempResult.mapReduce(map, reduce)

第二个map-reduce本质上是一个重新归约,应该可以为您提供所需的结果.

This second map-reduce is basically a re-reduce and should give you the results you need.

这篇关于MongoDB映射/减少多个集合?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆