在类路径中覆盖默认的hadoop罐子 [英] Overriding default hadoop jars in class path
问题描述
我已经看到许多使用用户类路径的方式作为hadoop的先例。通常情况下,如果一个m / r作业需要特定版本的库,hadoop恰巧已经使用旧版本(例如杰克逊的json解析器或commons http等),那么这是完成的。
无论如何:我看过:
mapreduce.task.classpath.user.precedence
mapreduce.task.classpath.first
mapreduce.job.user.classpath.first
这些参数中的哪一个是要在我的作业配置中设置的 right ,以强制映射器和缩减器具有使我的用户定义的类路径 hadoop_classpath
在hadoop默认依赖jar之前的jar吗?
顺便说一句,这与这个问题有关:
Dynamodb requestHandler接受,我最近发现是由于jar冲突。因此,假设你使用0.20.203,这是在
mapreduce.user .classpath.first
getClassPaths (..)
getClassPaths()
,您应该能够请参阅配置属性用于决定您的作业+ dist缓存库,还是hadoop库首先在类路径中进行
对于其他版本的hadoop,你最好检查TaskRunner.java类以确认配置属性的名称,当所有这些都是semi隐藏配置:
static final String MAPREDUCE_USER_CLASSPATH_FIRST =
mapreduce.user.classpath.first ; //半隐藏配置
I've seen many manifestations of ways to use the user class path as precedent to the hadoop one. Often times this is done if an m/r job needs a specific version of a library that hadoop coincidentally already uses an older version of (for example jackson's json parser or commons http , etc.)
In any case : I've seen :
mapreduce.task.classpath.user.precedence
mapreduce.task.classpath.first
mapreduce.job.user.classpath.first
Which one of these parameters is the right one to set in my job configuration, in order to force mappers and reducers to have a class path which puts my user defined hadoop_classpath
jars BEFORE the hadoop default dependency jars ?
By the way, this is related to this question : Dynamodb requestHandler acception which I recently have found is due to a jar conflict.
So, assuming you're using 0.20.203, this is handled in the TaskRunner.java code as follows:
- The property you're looking for is on line 94 -
mapreduce.user.classpath.first
- Line 214 is where the call is made to build the list of classpaths, which delegates to a method called
getClassPaths(..)
getClassPaths()
is defined on line 524, and you should be able to see that the configuration property is used to decide on whether your job + dist cache libraries, or the hadoop libraries go on the classpath first
For other versions of hadoop, you're best to check the TaskRunner.java class to confirm the name of the config property after all this is a "semi hidden config":
static final String MAPREDUCE_USER_CLASSPATH_FIRST =
"mapreduce.user.classpath.first"; //a semi-hidden config
这篇关于在类路径中覆盖默认的hadoop罐子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!