Hadoop:您可以使用一对值作为“键"吗? [英] Hadoop: can you use a pair of values as "Key"?

查看:95
本文介绍了Hadoop:您可以使用一对值作为“键"吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试分析大型犯罪统计数据集,该文件大约为CSV格式的2 GB.大约有20列,但我只对其中一部分感兴趣:Crime_Type和Crime_in_Year.例如,犯罪类型盗窃",每年发生在2001年至2013年之间.我想得到一个统计每年入室盗窃发生次数的结果.

I am trying to analyze a big crimes statistics data set, the file is about 2 GB in CSV format. There are about 20 columns, but I am interested in only a subset of it: Crime_Type and Crime_in_Year. For example, the crime type "burglary", it happens from 2001 through 2013, every year. I want to have a result that counts the occurrences of burglary in each year.

因此,我正在考虑拥有一个密钥,该值将是2003年它的总和.在hadoop/mapreduce中是否可以有一对值作为密钥?

So I am thinking of having a key , and the value will be the sum of its occurrence in year 2003. Is it possible to have a pair of value as key in hadoop/mapreduce?

推荐答案

Key可以是任何东西,只要它实现Writable.您可以很容易地编写自己的自定义密钥,如下所示:这里.

A Key can be anything so long as it implements Writable. You could write your own custom key pretty easily as shown here.

因此,从文档中借用一个实现可能是

So borrowing from the documentation, one implementation might be

public class CrimeWritable implements Writable {    
       private int year;
       private String type;

       public void write(DataOutput out) throws IOException {
         out.writeInt(year);
         out.writeBytes(type);
       }

       public void readFields(DataInput in) throws IOException {
         year = in.readInt();
         type = in.readBytes();
       }

       public static CrimeWritable read(DataInput in) throws IOException {
         CrimeWritable w = new CrimeWritable();
         w.readFields(in);
         return w;
       }
     }

在相关说明中,您可能要考虑使用比map-reduce更高级别的抽象,例如层叠 Apache Spark .

On a related note, you might want to consider using a higher level abstraction than map-reduce like Cascading or Apache Spark.

这篇关于Hadoop:您可以使用一对值作为“键"吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆