无法通过级联使用一个键连接两个文件 [英] Couldn`t join two files with one key via Cascading
本文介绍了无法通过级联使用一个键连接两个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
让我们看看我们拥有什么.第一个文件[接口类]:
Lets see what we have. First file [Interface Class]:
list arrayList
list linkedList
第二个文件[Class1数量]:
Second file[Class1 amount]:
arrayList 120
linkedList 4
我想通过key [Class]加入这两个文件,并获取每个接口的计数:
I would like to join this two files by key[Class] and get count per each Interface:
list arraylist 120
list linkedlist 4
代码:
public class Main
{
public static void main( String[] args )
{
String docPath = args[ 0 ];
String wcPath = args[ 1 ];
String doc2Path = args[ 2 ];
Properties properties = new Properties();
AppProps.setApplicationJarClass( properties, Main.class );
AppProps.setApplicationName( properties, "Part 1" );
AppProps.addApplicationTag( properties, "lets:do:it" );
AppProps.addApplicationTag( properties, "technology:Cascading" );
FlowConnector flowConnector = new Hadoop2MR1FlowConnector( properties );
// create source and sink taps
Tap wcTap = new Hfs(new TextDelimited(true, ","), wcPath);
Fields classInterfaceFiles = new Fields("interface", "class");
Tap classInterfaceTap = new Hfs(new TextDelimited(classInterfaceFiles, true, ","), docPath);
Fields classAmountFields = new Fields("class1", "amount");
Tap classAmountFileTap = new Hfs(new TextDelimited(classAmountFields, true, ","), doc2Path);
Tap outTap = new MultiSinkTap(); // just saying, create your own tap
Pipe classInterfaceFilePipe = new Pipe("classInterfaceFilePipe");
Pipe classIAmountFilePipe = new Pipe("classIAmountFilePipe");
Fields groupFields = new Fields("class");
Fields groupFields1 = new Fields("class1"); // fields used as joining keys
Pipe outPipe = new CoGroup(classInterfaceFilePipe, groupFields, classIAmountFilePipe, groupFields1, new InnerJoin());
// build flow definition
FlowDef flowDef = FlowDef.flowDef().setName("myFlow")
.addSource(classInterfaceFilePipe, classInterfaceTap)
.addSource(classIAmountFilePipe, classAmountFileTap)
.addTailSink(outPipe, wcTap);
// .addTailSink( outPipe, wcTap );
// write a DOT file and run the flow
Flow wcFlow = flowConnector.connect( flowDef );
wcFlow.writeDOT( "dot/wc.dot" );
wcFlow.complete();
}
}
[这是更大的任务的一步]
[this is one step of bigger task]
推荐答案
之所以发生这种情况,是因为您在连接在一起的两个管道(即类")中具有相同的字段.可能您可以将它们重命名为"class_interface"和"class_amount".您还必须更改在CoGroup管道中使用的groupFields.
This is happening because you have same field in two pipes which are joined together i.e. "class". Probably you can rename those to "class_interface" and "class_amount". You will also have to make change in the groupFields you used in CoGroup pipe.
这篇关于无法通过级联使用一个键连接两个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文