在类路径中覆盖默认的hadoop罐子 [英] Overriding default hadoop jars in class path

查看:202
本文介绍了在类路径中覆盖默认的hadoop罐子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经看到许多使用用户类路径的方式作为hadoop的先例。通常情况下,如果一个m / r作业需要特定版本的库,hadoop恰巧已经使用旧版本(例如杰克逊的json解析器或commons http等),那么这是完成的。

无论如何:我看过:

  mapreduce.task.classpath.user.precedence 
mapreduce.task.classpath.first
mapreduce.job.user.classpath.first

这些参数中的哪一个是要在我的作业配置中设置的 right ,以强制映射器和缩减器具有使我的用户定义的类路径 hadoop_classpath 在hadoop默认依赖jar之前的jar吗?

顺便说一句,这与这个问题有关:
Dynamodb requestHandler接受,我最近发现是由于jar冲突。因此,假设你使用0.20.203,这是在 解决方案 svn.apache.org/viewvc/hadoop/common/tags/release-0.20.203.0/src/mapred/org/apache/hadoop/mapred/TaskRunner.java?view=markuprel =nofollow> TaskRunner.java 代码如下:


  • 您正在寻找的属性在第94行 - mapreduce.user .classpath.first

  • 第214行是调用构建类路径列表的地方,它委托给一个名为 getClassPaths (..)
  • 第524行定义了
  • getClassPaths(),您应该能够请参阅配置属性用于决定您的作业+ dist缓存库,还是hadoop库首先在类路径中进行



对于其他版本的hadoop,你最好检查TaskRunner.java类以确认配置属性的名称,当所有这些都是semi隐藏配置

  static final String MAPREDUCE_USER_CLASSPATH_FIRST = 
mapreduce.user.classpath.first ; //半隐藏配置


I've seen many manifestations of ways to use the user class path as precedent to the hadoop one. Often times this is done if an m/r job needs a specific version of a library that hadoop coincidentally already uses an older version of (for example jackson's json parser or commons http , etc.)

In any case : I've seen :

mapreduce.task.classpath.user.precedence
mapreduce.task.classpath.first
mapreduce.job.user.classpath.first

Which one of these parameters is the right one to set in my job configuration, in order to force mappers and reducers to have a class path which puts my user defined hadoop_classpath jars BEFORE the hadoop default dependency jars ?

By the way, this is related to this question : Dynamodb requestHandler acception which I recently have found is due to a jar conflict.

解决方案

So, assuming you're using 0.20.203, this is handled in the TaskRunner.java code as follows:

  • The property you're looking for is on line 94 - mapreduce.user.classpath.first
  • Line 214 is where the call is made to build the list of classpaths, which delegates to a method called getClassPaths(..)
  • getClassPaths() is defined on line 524, and you should be able to see that the configuration property is used to decide on whether your job + dist cache libraries, or the hadoop libraries go on the classpath first

For other versions of hadoop, you're best to check the TaskRunner.java class to confirm the name of the config property after all this is a "semi hidden config":

static final String MAPREDUCE_USER_CLASSPATH_FIRST =
        "mapreduce.user.classpath.first"; //a semi-hidden config

这篇关于在类路径中覆盖默认的hadoop罐子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆