MapReduce-WritableComparables [英] MapReduce - WritableComparables

查看:97
本文介绍了MapReduce-WritableComparables的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Java和Hadoop的新手.我正在尝试一个非常简单的程序来获取频繁配对".

I’m new to both Java and Hadoop. I’m trying a very simple program to get Frequent pairs.

例如

Input: My name is Foo. Foo is student. 
Intermediate Output:
    Map:
        (my, name): 1
        (name ,is): 1
        (is, Foo): 2 // (is, Foo) = (Foo, is) 
        (is, student)

所以最后应该经常给的是(is ,Foo).

So finally it should give frequent pair is (is ,Foo).

伪代码如下:

Map(Key: line_num, value: line)
words = split_words(line)
for each w in words:
     for each neighbor x:
          emit((w, x)), 1)

这里我的钥匙不是一对,而是一对.阅读文档时,我读到对于每个新密钥,我们必须实现可写可比.

Here my key is not one, it’s pair. While going through documentation, I read that for each new key we have to implement WritableComparable.

所以我对此感到困惑.如果有人可以解释这堂课,那太好了.不确定这是真的.然后我可以自己弄清楚该怎么做!

So I'm confused about that. If someone can explain about this class, that would be great. Not sure it’s really true. Then I can figure out on my own how to do that!

我既不需要任何映射器,也不需要任何映射器……只想了解WritableComparable的作用是什么?哪种WritableComparable方法实际上比较键?我可以看到equals和compareTo,但是找不到任何解释.请没有密码!谢谢

I don't want any code neither mapper nor anything ... just want to understand what does this WritableComparable do? Which method of WritableComparable actually compares keys? I can see equals and compareTo, but I cannot find any explanation about that. Please no code! Thanks

在compareTo中,我对(a,b)=(b,a)返回0,但仍然无法使用相同的reducer,在compareTo方法中有什么方法可以将键(b,a)重置为(a,b)或生成全新的钥匙?

EDIT 1: In compareTo I return 0 for pair (a, b) = (b, a) but still its not going to same reducer, is there any way in compareTo method I reset key (b, a) to (a, b) or generate totally new key?

我不知道生成新密钥,但是在compareTo更改逻辑中,它工作得很好..!谢谢大家!

EDIT 2: I don't know for generating new key, but in compareTo changing logic, it worked fine ..! Thanks everyone!

推荐答案

WritableComparable是使实现该类的类有两方面的接口:Writable,这意味着可以通过以下方式将其写入网络或从网络中读取序列化等.如果要使用它作为键或值,以便可以在Hadoop节点之间发送,则必须这样做. Comparable,这意味着必须提供一些方法,以显示如何将给定类的一个对象与另一个对象进行比较.当Reducer通过密钥进行组织时使用.

WritableComparable is an interface that makes the class that implements it be two things: Writable, meaning it can be written to and read from your network via serialization, etc. This is necessary if you're going to use it as a key or value so that it can be sent between Hadoop nodes. And Comparable, which means that methods must be provided that show how one object of the given class can be compared to another. This is used when the Reducer organizes by key.

当您要创建自己的对象作为键时,此接口是必需的.而且您需要创建自己的InputFormat,而不是使用Hadoop随附的之一. (根据我的经验)这可能会变得相当困难,特别是如果您是Java和Hadoop的新手.

This interface is neceesary when you want to create your own object to be a key. And you'd need to create your own InputFormat as opposed to using one of the ones that come with Hadoop. This can get be rather difficult (from my experience), especially if you're new to both Java and Hadoop.

因此,如果我是您,那么我就不会再打扰了,因为有一种更简单的方法.我会使用TextInputFormat,它既是默认的InputFormat,又非常易于使用和理解.您可以简单地将每个键作为Text对象发出,这与字符串非常相似.但是有一个警告.就像您提到的"is Foo""Foo is"需要被评估为相同的键.因此,对于您拉出的每对单词,在按String.compareTo方法将它们作为键传递之前,请按字母顺序对其进行排序.这样可以确保您没有重复.

So if I were you, I wouldn't bother with that as there's a much simpler way. I would use TextInputFormat which is conveniently both the default InputFormat as well as pretty easy to use and understand. You could simply emit each key as a Text object which is pretty simliar to a string. There is a caveat though; like you mentioned "is Foo" and "Foo is" need to be evaluated to be the same key. So with every pair of words you pull out, sort them alphabetically before passing them as a key with the String.compareTo method. That way you're guarenteed to have no repeats.

这篇关于MapReduce-WritableComparables的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆