hadoop MultipleInputs因ClassCastException而失败 [英] hadoop MultipleInputs fails with ClassCastException

查看:125
本文介绍了hadoop MultipleInputs因ClassCastException而失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的hadoop版本是1.0.3,当我使用多输入时,我得到了这个错误。

  java.lang.ClassCastException:org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit无法转换为org.apache .hadoop.mapreduce.lib.input.FileSplit $ b $在org.myorg.textimage $ ImageMapper.setup(textimage.java:80)
在org.apache.hadoop.mapreduce.Mapper.run(Mapper。
位于org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask。)中,
位于org.apache.hadoop.mapreduce.lib.input.DelegatingMapper.run(DelegatingMapper.java:55)。
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child $ 4.run(Child.java:255 )
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security .UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

我测试过单一输入路径,没有问题。只有当我使用

  MultipleInputs.addInputPath(job,TextInputpath,TextInputFormat.class,
TextMapper.class);
MultipleInputs.addInputPath(job,ImageInputpath,
WholeFileInputFormat.class,ImageMapper.class);

我搜索了一下,发现这个链接 https://issues.apache.org/jira/browse/MAPREDUCE-1178 它说0.21有这个错误。但是我使用的是1.0.3,这个bug再次回来了。任何人都有同样的问题,或任何人都可以告诉我如何解决它?谢谢

这里是图像映射器的设置代码,第四行是发生错误的地方:

  protected void setup(Context context)抛出IOException,
InterruptedException {
InputSplit split = context.getInputSplit();
Path path =((FileSplit)split).getPath();
尝试{
pa = new Text(path.toString());
} catch(Exception e){
// TODO自动生成的catch块
e.printStackTrace();
}
}


解决方案

在我的评论中,对于 TaggedInputSplit 的Javadocs确认您可能错误地将输入拆分转换为FileSplit:



<$ p @ p> / **
*通过{@link DelegatingInputFormat}和{@link DelegatingMapper}为另一个InputSplit添加额外数据以使用
* }秒。
* /

我的猜测是您的设置方法如下所示:

  @Override 
protected void setup(Context context)抛出IOException,
InterruptedException {
FileSplit split =( FileSplit)context.getInputSplit();
}

不幸的是 TaggedInputSplit 是不公开可见,所以你不能轻易做一个 instanceof 样式检查,然后进行转换,然后调用 TaggedInputSplit.getInputSplit() code>来获得实际的底层FileSplit。因此,无论您需要自己更新源代码并重新编译和部署,发布JIRA票据以要求在未来的版本中修正它(如果它已经未在2+中执行)或执行一些令人讨厌的讨厌反射hackery得到底层InputSplit



这是完全未经测试的:

  @Override 
保护无效设置(上下文上下文)抛出IOException,
InterruptedException {
InputSplit split = context.getInputSplit();
Class <?扩展InputSplit> splitClass = split.getClass();

FileSplit fileSplit = null;
if(splitClass.equals(FileSplit.class)){
fileSplit =(FileSplit)split;
} else if(splitClass.getName()。equals(
org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit)){
//开始反射hackery ...

尝试{
方法getInputSplitMethod = splitClass
.getDeclaredMethod(getInputSplit);
getInputSplitMethod.setAccessible(true);
fileSplit =(FileSplit)getInputSplitMethod.invoke(split);
} catch(Exception e){
//包装并重新抛出错误
抛出new IOException(e);
}

//结束反射hackery
}
}

Reflection Hackery解释:

使用TaggedInputSplit作为保护范围声明,对于<$ c之外的类是不可见的$ c> org.apache.hadoop.mapreduce.lib.input 包,因此您不能在您的设置方法中引用该类。为了解决这个问题,我们执行了一些基于反射的操作:


  1. 检查类名称,我们可以测试类型TaggedInputSplit使用它的完全限定名称


    $ b

    splitClass.getName()。equals(org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit )


  2. 我们知道我们要调用 TaggedInputSplit.getInputSplit()方法来恢复包装的输入分割,所以我们利用 Class.getMethod(..)反射方法来获取对方法的引用:

    方法getInputSplitMethod = splitClass.getDeclaredMethod(getInputSplit);


  3. 类仍然不是公共可见的,所以我们使用setAccessible(..)方法来覆盖这个,从而阻止安全管理器抛出异常

    getInputSplitMethod.setAccessible(true);


  4. 最后,我们调用引用inpu t分割并将结果转换为FileSplit(乐观地希望它的一个这种类型的实例!):
    $ b $ p $ file ctp =(FileSplit)getInputSplitMethod.invoke (split);



My hadoop version is 1.0.3,when I use multipleinputs, I got this error.

java.lang.ClassCastException: org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit cannot be cast to org.apache.hadoop.mapreduce.lib.input.FileSplit
at org.myorg.textimage$ImageMapper.setup(textimage.java:80)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapreduce.lib.input.DelegatingMapper.run(DelegatingMapper.java:55)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

I tested single input path, no problem. Only when I use

MultipleInputs.addInputPath(job, TextInputpath, TextInputFormat.class,
            TextMapper.class);
    MultipleInputs.addInputPath(job, ImageInputpath,
            WholeFileInputFormat.class, ImageMapper.class); 

I googled and found this link https://issues.apache.org/jira/browse/MAPREDUCE-1178 which said 0.21 had this bug. But I am using 1.0.3, does this bug come back again. Anyone has the same problem or anyone can tell me how to fix it? Thanks

here is the setup code of image mapper,4th line is where the error occurs:

protected void setup(Context context) throws IOException,
            InterruptedException {
        InputSplit split = context.getInputSplit();
        Path path = ((FileSplit) split).getPath();
        try {
            pa = new Text(path.toString());
        } catch (Exception e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }

解决方案

Following up on my comment, the Javadocs for TaggedInputSplit confirms that you are probably wrongly casting the input split to a FileSplit:

/**
 * An {@link InputSplit} that tags another InputSplit with extra data for use
 * by {@link DelegatingInputFormat}s and {@link DelegatingMapper}s.
 */

My guess is your setup method looks something like this:

@Override
protected void setup(Context context) throws IOException,
        InterruptedException {
    FileSplit split = (FileSplit) context.getInputSplit();
}

Unfortunately TaggedInputSplit is not public visible, so you can't easily do an instanceof style check, followed by a cast and then call to TaggedInputSplit.getInputSplit() to get the actual underlying FileSplit. So either you'll need to update the source yourself and re-compile&deploy, post a JIRA ticket to ask this to be fixed in future version (if it already hasn't been actioned in 2+) or perform some nasty nasty reflection hackery to get to the underlying InputSplit

This is completely untested:

@Override
protected void setup(Context context) throws IOException,
        InterruptedException {
    InputSplit split = context.getInputSplit();
    Class<? extends InputSplit> splitClass = split.getClass();

    FileSplit fileSplit = null;
    if (splitClass.equals(FileSplit.class)) {
        fileSplit = (FileSplit) split;
    } else if (splitClass.getName().equals(
            "org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit")) {
        // begin reflection hackery...

        try {
            Method getInputSplitMethod = splitClass
                    .getDeclaredMethod("getInputSplit");
            getInputSplitMethod.setAccessible(true);
            fileSplit = (FileSplit) getInputSplitMethod.invoke(split);
        } catch (Exception e) {
            // wrap and re-throw error
            throw new IOException(e);
        }

        // end reflection hackery
    }
}

Reflection Hackery Explained:

With TaggedInputSplit being declared protected scope, it's not visible to classes outside the org.apache.hadoop.mapreduce.lib.input package, and therefore you cannot reference that class in your setup method. To get around this, we perform a number of reflection based operations:

  1. Inspecting the class name, we can test for the type TaggedInputSplit using it's fully qualified name

    splitClass.getName().equals("org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit")

  2. We know we want to call the TaggedInputSplit.getInputSplit() method to recover the wrapped input split, so we utilize the Class.getMethod(..) reflection method to acquire a reference to the method:

    Method getInputSplitMethod = splitClass.getDeclaredMethod("getInputSplit");

  3. The class still isn't public visible so we use the setAccessible(..) method to override this, stopping the security manager from throwing an exception

    getInputSplitMethod.setAccessible(true);

  4. Finally we invoke the method on the reference to the input split and cast the result to a FileSplit (optimistically hoping its a instance of this type!):

    fileSplit = (FileSplit) getInputSplitMethod.invoke(split);

这篇关于hadoop MultipleInputs因ClassCastException而失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆