hadoop MultipleInputs 失败并出现 ClassCastException [英] hadoop MultipleInputs fails with ClassCastException

查看:25
本文介绍了hadoop MultipleInputs 失败并出现 ClassCastException的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的 hadoop 版本是 1.0.3,当我使用多输入时,出现这个错误.

My hadoop version is 1.0.3,when I use multipleinputs, I got this error.

java.lang.ClassCastException: org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit cannot be cast to org.apache.hadoop.mapreduce.lib.input.FileSplit
at org.myorg.textimage$ImageMapper.setup(textimage.java:80)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapreduce.lib.input.DelegatingMapper.run(DelegatingMapper.java:55)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

我测试了单输入路径,没问题.只有当我使用

I tested single input path, no problem. Only when I use

MultipleInputs.addInputPath(job, TextInputpath, TextInputFormat.class,
            TextMapper.class);
    MultipleInputs.addInputPath(job, ImageInputpath,
            WholeFileInputFormat.class, ImageMapper.class); 

我用谷歌搜索并找到了这个链接 https://issues.apache.org/jira/浏览/MAPREDUCE-1178 说 0.21 有这个错误.但是我用的是1.0.3,这个bug又回来了.任何人都有同样的问题,或者任何人都可以告诉我如何解决它?谢谢

I googled and found this link https://issues.apache.org/jira/browse/MAPREDUCE-1178 which said 0.21 had this bug. But I am using 1.0.3, does this bug come back again. Anyone has the same problem or anyone can tell me how to fix it? Thanks

这里是image mapper的设置代码,第4行是出错的地方:

here is the setup code of image mapper,4th line is where the error occurs:

protected void setup(Context context) throws IOException,
            InterruptedException {
        InputSplit split = context.getInputSplit();
        Path path = ((FileSplit) split).getPath();
        try {
            pa = new Text(path.toString());
        } catch (Exception e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }

推荐答案

根据我的评论,TaggedInputSplit 的 Javadoc 确认您可能错误地将输入拆分转换为 FileSplit:

Following up on my comment, the Javadocs for TaggedInputSplit confirms that you are probably wrongly casting the input split to a FileSplit:

/**
 * An {@link InputSplit} that tags another InputSplit with extra data for use
 * by {@link DelegatingInputFormat}s and {@link DelegatingMapper}s.
 */

我猜你的设置方法是这样的:

My guess is your setup method looks something like this:

@Override
protected void setup(Context context) throws IOException,
        InterruptedException {
    FileSplit split = (FileSplit) context.getInputSplit();
}

不幸的是,TaggedInputSplit 不是公开可见的,因此您不能轻松地进行 instanceof 样式检查,然后进行强制转换,然后调用 TaggedInputSplit.getInputSplit() 获取实际的底层 FileSplit.因此,要么您需要自己更新源代码并重新编译和部署,发布 JIRA 票证以要求在未来版本中修复此问题(如果尚未在 2+ 中进行操作),要么执行一些令人讨厌的讨厌反射黑客来获取底层的InputSplit

Unfortunately TaggedInputSplit is not public visible, so you can't easily do an instanceof style check, followed by a cast and then call to TaggedInputSplit.getInputSplit() to get the actual underlying FileSplit. So either you'll need to update the source yourself and re-compile&deploy, post a JIRA ticket to ask this to be fixed in future version (if it already hasn't been actioned in 2+) or perform some nasty nasty reflection hackery to get to the underlying InputSplit

这是完全未经测试的:

@Override
protected void setup(Context context) throws IOException,
        InterruptedException {
    InputSplit split = context.getInputSplit();
    Class<? extends InputSplit> splitClass = split.getClass();

    FileSplit fileSplit = null;
    if (splitClass.equals(FileSplit.class)) {
        fileSplit = (FileSplit) split;
    } else if (splitClass.getName().equals(
            "org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit")) {
        // begin reflection hackery...

        try {
            Method getInputSplitMethod = splitClass
                    .getDeclaredMethod("getInputSplit");
            getInputSplitMethod.setAccessible(true);
            fileSplit = (FileSplit) getInputSplitMethod.invoke(split);
        } catch (Exception e) {
            // wrap and re-throw error
            throw new IOException(e);
        }

        // end reflection hackery
    }
}

反射黑客解释:

由于 TaggedInputSplit 被声明为受保护的范围,它对 org.apache.hadoop.mapreduce.lib.input 包之外的类不可见,因此您不能在设置方法中引用该类.为了解决这个问题,我们执行了一些基于反射的操作:

With TaggedInputSplit being declared protected scope, it's not visible to classes outside the org.apache.hadoop.mapreduce.lib.input package, and therefore you cannot reference that class in your setup method. To get around this, we perform a number of reflection based operations:

  1. 检查类名,我们可以使用它的完全限定名测试 TaggedInputSplit 类型

  1. Inspecting the class name, we can test for the type TaggedInputSplit using it's fully qualified name

splitClass.getName().equals("org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit")

我们知道要调用 TaggedInputSplit.getInputSplit() 方法来恢复包装的输入拆分,因此我们使用 Class.getMethod(..) 获取方法引用的反射方法:

We know we want to call the TaggedInputSplit.getInputSplit() method to recover the wrapped input split, so we utilize the Class.getMethod(..) reflection method to acquire a reference to the method:

Method getInputSplitMethod = splitClass.getDeclaredMethod("getInputSplit");

该类仍然不是公共可见的,所以我们使用 setAccessible(..) 方法来覆盖它,阻止安全管理器抛出异常

The class still isn't public visible so we use the setAccessible(..) method to override this, stopping the security manager from throwing an exception

getInputSplitMethod.setAccessible(true);

最后,我们调用对输入拆分的引用的方法,并将结果转换为 FileSplit(乐观地希望它是这种类型的实例!):

Finally we invoke the method on the reference to the input split and cast the result to a FileSplit (optimistically hoping its a instance of this type!):

fileSplit = (FileSplit) getInputSplitMethod.invoke(split);

这篇关于hadoop MultipleInputs 失败并出现 ClassCastException的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆