Hadoop:间隔和JOIN [英] Hadoop: intervals and JOIN

查看:97
本文介绍了Hadoop:间隔和JOIN的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对 Hadoop 很陌生,目前我正试图加入两个数据源,其中的关键是间隔(例如[date-begin / date -结束])。例如:

input1:

  20091001-20091002 A 
20091011-20091104 B
20080111-20091103 C
(...)

input2:

  20090902-20091003 D 
20081015-20091204 E
20040011-20050101 F
(...)

我想查找key1重叠的所有记录KEY2。 hadoop有可能吗?在哪里可以找到一个实现的例子?



谢谢。

解决方案

Biostar提供了一个解决方案: http://biostar.stackexchange.com/questions/8821

I'm very new to Hadoop and I'm currently trying to join two sources of data where the key is an interval (say [date-begin/date-end]). For example:

input1:

20091001-20091002    A
20091011-20091104    B
20080111-20091103    C
(...)

input2:

20090902-20091003    D
20081015-20091204    E
20040011-20050101    F
(...)

I'd like to find all the records where the key1 overlaps the key2. Is it possible with hadoop ? Where can I find an example of implementation ?

Thanks.

解决方案

A solution was given on Biostar: http://biostar.stackexchange.com/questions/8821

这篇关于Hadoop:间隔和JOIN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆