hadoop MultipleInputs因ClassCastException而失败 [英] hadoop MultipleInputs fails with ClassCastException
问题描述
java.lang.ClassCastException:org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit无法转换为org.apache .hadoop.mapreduce.lib.input.FileSplit $ b $在org.myorg.textimage $ ImageMapper.setup(textimage.java:80)
在org.apache.hadoop.mapreduce.Mapper.run(Mapper。
位于org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask。)中,
位于org.apache.hadoop.mapreduce.lib.input.DelegatingMapper.run(DelegatingMapper.java:55)。
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child $ 4.run(Child.java:255 )
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security .UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
我测试过单一输入路径,没有问题。只有当我使用
MultipleInputs.addInputPath(job,TextInputpath,TextInputFormat.class,
TextMapper.class);
MultipleInputs.addInputPath(job,ImageInputpath,
WholeFileInputFormat.class,ImageMapper.class);
我搜索了一下,发现这个链接 https://issues.apache.org/jira/browse/MAPREDUCE-1178 它说0.21有这个错误。但是我使用的是1.0.3,这个bug再次回来了。任何人都有同样的问题,或任何人都可以告诉我如何解决它?谢谢
这里是图像映射器的设置代码,第四行是发生错误的地方:
protected void setup(Context context)抛出IOException,
InterruptedException {
InputSplit split = context.getInputSplit();
Path path =((FileSplit)split).getPath();
尝试{
pa = new Text(path.toString());
} catch(Exception e){
// TODO自动生成的catch块
e.printStackTrace();
}
}
在我的评论中,对于 TaggedInputSplit
的Javadocs确认您可能错误地将输入拆分转换为FileSplit:
<$ p @ p>
/ **
*通过{@link DelegatingInputFormat}和{@link DelegatingMapper}为另一个InputSplit添加额外数据以使用
* }秒。
* /
我的猜测是您的设置方法如下所示:
@Override
protected void setup(Context context)抛出IOException,
InterruptedException {
FileSplit split =( FileSplit)context.getInputSplit();
}
不幸的是 TaggedInputSplit
是不公开可见,所以你不能轻易做一个 instanceof
样式检查,然后进行转换,然后调用 TaggedInputSplit.getInputSplit() code>来获得实际的底层FileSplit。因此,无论您需要自己更新源代码并重新编译和部署,发布JIRA票据以要求在未来的版本中修正它(如果它已经未在2+中执行)或执行一些令人讨厌的讨厌反射hackery得到底层InputSplit
这是完全未经测试的:
@Override
保护无效设置(上下文上下文)抛出IOException,
InterruptedException {
InputSplit split = context.getInputSplit();
Class <?扩展InputSplit> splitClass = split.getClass();
FileSplit fileSplit = null;
if(splitClass.equals(FileSplit.class)){
fileSplit =(FileSplit)split;
} else if(splitClass.getName()。equals(
org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit)){
//开始反射hackery ...
尝试{
方法getInputSplitMethod = splitClass
.getDeclaredMethod(getInputSplit);
getInputSplitMethod.setAccessible(true);
fileSplit =(FileSplit)getInputSplitMethod.invoke(split);
} catch(Exception e){
//包装并重新抛出错误
抛出new IOException(e);
}
//结束反射hackery
}
}
Reflection Hackery解释:
使用TaggedInputSplit作为保护范围声明,对于<$ c之外的类是不可见的$ c> org.apache.hadoop.mapreduce.lib.input 包,因此您不能在您的设置方法中引用该类。为了解决这个问题,我们执行了一些基于反射的操作:
-
检查类名称,我们可以测试类型TaggedInputSplit使用它的完全限定名称
$ bsplitClass.getName()。equals(org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit )
-
我们知道我们要调用
TaggedInputSplit.getInputSplit()
方法来恢复包装的输入分割,所以我们利用Class.getMethod(..)
反射方法来获取对方法的引用:
方法getInputSplitMethod = splitClass.getDeclaredMethod(getInputSplit);
-
类仍然不是公共可见的,所以我们使用setAccessible(..)方法来覆盖这个,从而阻止安全管理器抛出异常
getInputSplitMethod.setAccessible(true);
-
最后,我们调用引用inpu t分割并将结果转换为FileSplit(乐观地希望它的一个这种类型的实例!):
$ b $ p $ file ctp =(FileSplit)getInputSplitMethod.invoke (split);
My hadoop version is 1.0.3,when I use multipleinputs, I got this error.
java.lang.ClassCastException: org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit cannot be cast to org.apache.hadoop.mapreduce.lib.input.FileSplit
at org.myorg.textimage$ImageMapper.setup(textimage.java:80)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapreduce.lib.input.DelegatingMapper.run(DelegatingMapper.java:55)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
I tested single input path, no problem. Only when I use
MultipleInputs.addInputPath(job, TextInputpath, TextInputFormat.class,
TextMapper.class);
MultipleInputs.addInputPath(job, ImageInputpath,
WholeFileInputFormat.class, ImageMapper.class);
I googled and found this link https://issues.apache.org/jira/browse/MAPREDUCE-1178 which said 0.21 had this bug. But I am using 1.0.3, does this bug come back again. Anyone has the same problem or anyone can tell me how to fix it? Thanks
here is the setup code of image mapper,4th line is where the error occurs:
protected void setup(Context context) throws IOException,
InterruptedException {
InputSplit split = context.getInputSplit();
Path path = ((FileSplit) split).getPath();
try {
pa = new Text(path.toString());
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
Following up on my comment, the Javadocs for TaggedInputSplit
confirms that you are probably wrongly casting the input split to a FileSplit:
/**
* An {@link InputSplit} that tags another InputSplit with extra data for use
* by {@link DelegatingInputFormat}s and {@link DelegatingMapper}s.
*/
My guess is your setup method looks something like this:
@Override
protected void setup(Context context) throws IOException,
InterruptedException {
FileSplit split = (FileSplit) context.getInputSplit();
}
Unfortunately TaggedInputSplit
is not public visible, so you can't easily do an instanceof
style check, followed by a cast and then call to TaggedInputSplit.getInputSplit()
to get the actual underlying FileSplit. So either you'll need to update the source yourself and re-compile&deploy, post a JIRA ticket to ask this to be fixed in future version (if it already hasn't been actioned in 2+) or perform some nasty nasty reflection hackery to get to the underlying InputSplit
This is completely untested:
@Override
protected void setup(Context context) throws IOException,
InterruptedException {
InputSplit split = context.getInputSplit();
Class<? extends InputSplit> splitClass = split.getClass();
FileSplit fileSplit = null;
if (splitClass.equals(FileSplit.class)) {
fileSplit = (FileSplit) split;
} else if (splitClass.getName().equals(
"org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit")) {
// begin reflection hackery...
try {
Method getInputSplitMethod = splitClass
.getDeclaredMethod("getInputSplit");
getInputSplitMethod.setAccessible(true);
fileSplit = (FileSplit) getInputSplitMethod.invoke(split);
} catch (Exception e) {
// wrap and re-throw error
throw new IOException(e);
}
// end reflection hackery
}
}
Reflection Hackery Explained:
With TaggedInputSplit being declared protected scope, it's not visible to classes outside the org.apache.hadoop.mapreduce.lib.input
package, and therefore you cannot reference that class in your setup method. To get around this, we perform a number of reflection based operations:
Inspecting the class name, we can test for the type TaggedInputSplit using it's fully qualified name
splitClass.getName().equals("org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit")
We know we want to call the
TaggedInputSplit.getInputSplit()
method to recover the wrapped input split, so we utilize theClass.getMethod(..)
reflection method to acquire a reference to the method:Method getInputSplitMethod = splitClass.getDeclaredMethod("getInputSplit");
The class still isn't public visible so we use the setAccessible(..) method to override this, stopping the security manager from throwing an exception
getInputSplitMethod.setAccessible(true);
Finally we invoke the method on the reference to the input split and cast the result to a FileSplit (optimistically hoping its a instance of this type!):
fileSplit = (FileSplit) getInputSplitMethod.invoke(split);
这篇关于hadoop MultipleInputs因ClassCastException而失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!