Riak Map Reduce在JS中返回有限的数据 [英] Riak Map Reduce in JS returning limited data

查看:87
本文介绍了Riak Map Reduce在JS中返回有限的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我有Riak在2个EC2服务器上运行,使用python运行javascript Mapreduce.它们已被群集.主要用于概念验证".

So I have Riak running on 2 EC2 servers, using python to run javascript Mapreduce. They have been clustered. Mainly used for "proof of concept".

存储桶中有50个键,所有map/reduce函数所做的只是重新格式化数据.这仅用于测试Riak中的地图/缩小功能.

There are 50 keys in the bucket, all the map/reduce function does is re-format the data. This is only for testing the map/reduce functionality in Riak.

问题:输出仅显示[{u'e':2,u'undefined':2,u'w':2}]. 那是完全错误的.日志显示所有键都有 已处理",但只有2个被退回.所以我的问题是为什么 发生了,我是否错过了重要的事情.

Problem: The output only shows [{u'e': 2, u'undefined': 2, u'w': 2}]. That is completely wrong. The logs show that all the keys have "processed" but only 2 get returned. So my question is why is that happening and am I missing something important.

代码:

import riak
client = riak.RiakClient()
query = riak.RiakMapReduce(client).add('raw_hits10')
query.map("""function(v) {
      var data = JSON.parse(v.values[0].data);
      return [[data, 1]];
}""")
query.reduce("""function(vk) {
         var res = {};
         for (var indx in vk) {
            var key_t = vk[indx][0];
            var val_t = vk[indx][1];
            ejsLog('/tmp/map_reduce.log', key_t + "--- " + val_t);

            res[key_t] = 2;
         }
         return [res]
    }
      """)


for res in query.run():
    print res

打印结果:

[{u'e': 2, u'undefined': 2, u'w': 2}]

这没有道理

推荐答案

为了避免在运行reduce阶段之前必须将前一阶段的所有数据加载到协调节点上的内存中(这对于大型mapreduce作业而言是有问题的),reduce函数将运行多次.每次迭代都会从前一阶段获得一批结果,以及从早期的reduce阶段迭代获得的任何输出.默认批处理大小为20,但这是可配置.由于一个reduce阶段迭代的结果将作为下一个迭代的输入,因此reduce阶段函数需要设计以处理此问题,

In order to avoid having to load all data from the preceding phase into memory on the coordinating node before running the reduce phase (which would be problematic for large mapreduce jobs), the reduce function is run multiple times. Every iteration gets a batch of results from preceding phase together with any output from earlier reduce phase iteration(s). The default batch size is 20, but this is configurable. As the results from one reduce phase iteration will be fed in as input to the next iteration, reduce phase functions need to designed to handle this, and some strategies are described here.

通过指定'reduce_phase_only_1'参数,也可以强制Riak对整个输入集只运行一次reduce阶段,但是通常不建议这样做,特别是对于大型作业.

It is also possible to force Riak to only run the reduce phase once for the entire input set by specifying the 'reduce_phase_only_1' parameter, but this is generally not recommended, especially for large jobs.

这篇关于Riak Map Reduce在JS中返回有限的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆