Hadoop奇怪的行为:reduce函数不能获取某个键的所有值 [英] Hadoop strange behaviour: reduce function doesn't get all values for a key

查看:207
本文介绍了Hadoop奇怪的行为:reduce函数不能获取某个键的所有值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的Hadoop项目中,我正在读取每行文本行的许多名称。第一个名字代表我的用户名,其余的是朋友列表。
然后我在map函数中创建一对(用户名,朋友),每一对都有一个键Key [name1] [name2],其中name1,2是用户名和朋友名字按字母顺序排列。
通常,在读取userA和userB行后,他们在他们的朋友列表中都有对方,我会得到2个具有不同值的标识键,在本例中为:KeyUserAUserB:UserA,UserB 和KeyUserAUserB:UserB,UserA。
我期望的reduce函数是在某一时刻将KeyUserAUserB作为关键字,并将一对UserA,UserB,UserB,UserA作为值。所以值迭代器将有2个元素。
但是,在reducer函数中,我分别获得了两次KeyUserAUserB。这不是我期待的Hadoop ....



我还注意到在我的用户日志中,我有4个m文件夹,并且在其中的前2个我有日志帮助我识别上述情况。在这两个m日志中,map函数的输出(System.out)都与reduce函数的输出交织在一起。我不知道这与我的异常情况有什么关系,但我希望减少输出保留在r文件夹中。
此外,对于上述示例,KeyUserAUserB的一个日志打印在一个m日志文件中,另一个KeyUserAUserB打印在另一个日志文件中。虽然在某些情况下,KeyUserAUserB会同时到达还原器价值,我发现至少有一个案例,它从来没有与两个值(也是那些2对键值相同的密钥驻留在不同的m日志文件)。

我注意到的另一件事,从Reduce函数收集的输出不会将值直接发送到输出文件,而是再次将它们作为输入传递给相同的Reduce函数...



您如何看待这种行为,可能的原因是什么?

解决方案

最后。整个意想不到的行为是因为我正在使用组合器类= reducer类。在评论该行后,一切都按预期工作。


In my Hadoop project, I am reading lines of text file with a number of names for each line. The first name represents my username, and the rest are a list of friends. Then I am creating pairs of (username, friend) , in the map function, each pair has a key "Key[name1][name2]" where name1,2 are the username and the friend name ordered alphabetically. Normally, after reading the line of userA and line of userB , and they both have each other in their friends list, I would get 2 identic keys with different values, which in this case is: KeyUserAUserB : "UserA,UserB" and KeyUserAUserB : "UserB,UserA". What I expect in the reduce function is to get, at one point, KeyUserAUserB as a key and a pair of "UserA,UserB","UserB,UserA" as values . So the values iterator would have 2 elements. However, in the reducer function, I get twice KeyUserAUserB with a single value respectively. This is not what I am expecting from Hadoop....

I also noticed in my userlogs , I have 4 "m" folders, and in the first 2 of them I have the logs which helped me identify the above. In both "m" logs the output (System.out) of the map function is intertwined with the output of reduce function . I don't know if that has anything to do with my anomaly, but I expected the reduce output to stay in the "r" folder. Also, for the above example, one log for KeyUserAUserB is printed in one "m" log file, and the other KeyUserAUserB in the other... Although for some cases it happens that a KeyUserAUserB comes to the reducer with both values, i found at least one case when it never comes with both values (and also those 2 pairs key-value with identical key reside in different "m" log files).

Another thing I noticed, the output collect from the Reduce function doesn't send the values directly to the output file, but passes them again as an input for the the same Reduce function...

What do you think about this behavior, what can be the possible causes?

解决方案

Finally. The whole unexpected behavior is because I am using a combiner class = the reducer class. After commenting that line, everything worked as expected.

这篇关于Hadoop奇怪的行为:reduce函数不能获取某个键的所有值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆