Hadoop:间隔和JOIN [英] Hadoop: intervals and JOIN
本文介绍了Hadoop:间隔和JOIN的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我对 Hadoop 很陌生,目前我正试图加入两个数据源,其中的关键是间隔(例如[date-begin / date -结束])。例如:
input1:
20091001-20091002 A
20091011-20091104 B
20080111-20091103 C
(...)
input2:
20090902-20091003 D
20081015-20091204 E
20040011-20050101 F
(...)
我想查找key1重叠的所有记录KEY2。 hadoop有可能吗?在哪里可以找到一个实现的例子?
谢谢。
解决方案
Biostar提供了一个解决方案: http://biostar.stackexchange.com/questions/8821
I'm very new to Hadoop and I'm currently trying to join two sources of data where the key is an interval (say [date-begin/date-end]). For example:
input1:
20091001-20091002 A
20091011-20091104 B
20080111-20091103 C
(...)
input2:
20090902-20091003 D
20081015-20091204 E
20040011-20050101 F
(...)
I'd like to find all the records where the key1 overlaps the key2. Is it possible with hadoop ? Where can I find an example of implementation ?
Thanks.
解决方案
A solution was given on Biostar: http://biostar.stackexchange.com/questions/8821
这篇关于Hadoop:间隔和JOIN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文