Druid / Hadoop批量索引/ Map Reduce / YARN /无远程，只是本地 [英] Druid / Hadoop batch index / Map Reduce / YARN / No remote, just local

查看：569 发布时间：2018/6/1 12:37:18 hadoop indexing mapreduce yarn druid

本文介绍了Druid / Hadoop批量索引/ Map Reduce / YARN /无远程，只是本地的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

已解决

我们需要将validation.jar放在hadoop / share / hadoop / common / lib /中（从

然而，现在我正在运行进入另一个问题。工作执行失败。 java.lang.reflect.InvocationTargetException，at at io.druid.indexer.JobHelper.runJobs（JobHelper.java:369）〜[druid-indexing-hadoop-0.9.2.jar：0.9.2]。

与此处报道的类似：

让我试着修复它，然后更新这篇文章。

更新：发现此问题：

我也可以从日志中确认索引成功，并通过检查HDFS：

另外，从YARN UI ...我没有看到任何工作被提交。

我根据文档配置了所有内容。在我的Druid的core-site.xml中，我有：

 < property> 
<名称> fs.default.name< /名称> 
< value> hdfs：// hadoop：9000< / value> 
< / property>

（是的，它是fs.default.name，而不是fs.defaultFS ...因为德鲁伊扩展仍然使用2.3.0，并且在2.4.x之前不知道defaultFS）。 少一点：我认为Druid中存在类路径的错误，它不会将hadoop依赖项添加到运行worker的类路径列表中（我已经在common的运行时属性中指定了默认坐标）。

另外，在霸主运行时.properties我已经指定了跑步者类型到远程的索引。在middleManager runtime.properties中是一样的。我可以看到这些配置由德鲁伊拾起。

<此外，索引日志存储类型设置为HDFS，我可以确认文件存储在HDFS中。

我可以确认深存储没有问题（输入文件从我指定的HDFS路径中提取，存储在HDFS中）。

我缺少什么？
解决方案
原来我们需要把validation.jar放在hadoop / share / hadoop / common / lib /下载（从 https://mvnrepository.com/artifact/javax.validation/validation-api *）。

将其与doc说：设置mapreduce.job.classloader为真在您的德鲁伊的索引任务json。

你会得到它的工作:)德鲁伊0.9.2与Hadoop 2.7.3

*）不知道为什么，我可以看到Druid将类路径中的所有jar添加到Hadoop（和validation.jar在那里）。也许有关于JVM如何从自定义类加载器（？）加载javax。*库的限制（$）
Resolved

Turns out we need to put validation.jar in hadoop/share/hadoop/common/lib/ ( download it from https://mvnrepository.com/artifact/javax.validation/validation-api *).

Combine that with what the doc says: set "mapreduce.job.classloader" to "true" in your Druid's indexing task json.

And you'll get it working :) -- Druid 0.9.2 with Hadoop 2.7.3

*) Not sure why, I could see that Druid uploaded all the jars in its classpath to Hadoop (and validation.jar is in there). Maybe there is a restriction on how JVM loads javax.* library from custom classloader (?)

What follows below is for historical purpose, to help searches.

UPDATE UPDATE

My bad. I forgot to copy the core-site.xml etc. in my Dockerfile, to the correct place in Druid instalation.

I fixed that, now it sends the job to hadoop.

However, now I'm running into another problem. Failure in the execution of the job. java.lang.reflect.InvocationTargetException, at at io.druid.indexer.JobHelper.runJobs(JobHelper.java:369) ~[druid-indexing-hadoop-0.9.2.jar:0.9.2].

Similar to the one reported here: https://groups.google.com/forum/#!topic/druid-development/_JXvLbykD0E . But that one at least has more hints in the stacktrace (permission). My case not so clear. Anyone having the same problem?

!!!UPDATE AGAIN!!!

I think this is the case I'm having. The same: https://groups.google.com/forum/#!topic/druid-user/4yDRoQZn8h8

And I confirmed it by checking the logs of MR through Hadoop's timeline server:

Let me try fixing it and update this post afterward.

Update: found this: https://groups.google.com/forum/#!topic/druid-user/U6zMkhm3WiU

Update: Nope. setting "mapreduce.job.classloader": "true" is giving me another problem on the map task: java.lang.ClassNotFoundException: javax.validation.Validator at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424).... This whole class-loading thing :(

So, the culprit is guice library. Druid 0.9.2 uses Guice 4.1.0, while Hadoop 2.7.3 stucks with Guice 3.0.0..., and the mapreduce.job.classloader is not working (it gives yet another java class not found problem).

What to do now? Copying guice 4.1.0 from Druid to Hadoop?

Original Post

Why Druid (0.9.2) is not submitting the job to resource manager (and have the job ran in the hadoop cluster)? Can someone point out what detail am I missing, please?

I have Hadoop cluster (pseudo) running version 2.7.2, on a machine whose host name set to 'hadoop'. That hadoop and and my druid run on separate docker instances. The druid docker has --link to the hadoop instance.

From the log I can tell that it performs the MR locally (using LocalJobRunner).

I can also confirm the indexing succeeded, from the log, and by checking HDFS:

Also, from the YARN UI... I'm not seeing any job being submitted.

I've configured everything according to the documentation. In core-site.xml of my Druid, I have:
<property> <name>fs.default.name</name> <value>hdfs://hadoop:9000</value> </property>
(Yes, it's fs.default.name, instead of fs.defaultFS... because Druid extension still uses 2.3.0, and defaultFS is not known until 2.4.x). Sidestep a little: I think there's a bug with classpath in Druid, it's not adding the hadoop-dependencies to the list of classpath for running worker (I've already specified default coordinates in the common's runtime properties).

Ok, also, in overlord runtime.properties I've specified indexing runner type to remote. The same in middleManager runtime.properties. I could see those config picked up by Druid.

Also, the indexing log storage type, set to HDFS, and I can confirm the file get stored in HDFS.

So, as far as deep-storage concerned, all is fine. It's just this Map-Reduce. Not running in cluster. Somebody also stumbled upon the same problem, no resolution from the thread. Here: https://groups.google.com/forum/#!topic/druid-user/vvX3VEGMTcw

I can confirm that deep-storage has not issue (the input file pulled from HDFS path I specified, and segments also stored in the HDFS).

What am I missing?
解决方案
Turns out we need to put validation.jar in hadoop/share/hadoop/common/lib/ ( download it from https://mvnrepository.com/artifact/javax.validation/validation-api *).

Combine that with what the doc says: set "mapreduce.job.classloader" to "true" in your Druid's indexing task json.

And you'll get it working :) Druid 0.9.2 with Hadoop 2.7.3

*) Not sure why, I could see that Druid uploaded all the jars in its classpath to Hadoop (and validation.jar is in there). Maybe there is a restriction on how JVM loads javax.* library from custom classloader (?)

这篇关于Druid / Hadoop批量索引/ Map Reduce / YARN /无远程，只是本地的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Druid / Hadoop批量索引/ Map Reduce / YARN /无远程，只是本地 [英] Druid / Hadoop batch index / Map Reduce / YARN / No remote, just local

问题描述

已解决

Resolved

What follows below is for historical purpose, to help searches.

UPDATE UPDATE

!!!UPDATE AGAIN!!!

Original Post

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

Druid / Hadoop批量索引/ Map Reduce / YARN /无远程，只是本地 [英] Druid / Hadoop batch index / Map Reduce / YARN / No remote, just local

问题描述

已解决

Resolved

What follows below is for historical purpose, to help searches.

UPDATE UPDATE

!!!UPDATE AGAIN!!!

Original Post

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭