嵌入式hadoop-pig:对UDF使用自动addContainingJar的正确方法是什么? [英] embedded hadoop-pig: what's the correct way to use the automatic addContainingJar for UDFs?

查看:124
本文介绍了嵌入式hadoop-pig:对UDF使用自动addContainingJar的正确方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当你使用pigServer.registerFunction时,你不应该明确地调用pigServer.registerJar,而是让猪使用jarManager.findContainingJar自动检测jar。

然而,我们有一个复杂的UDF,它的类依赖于来自多个罐子的其他类。所以我们用maven-assembly创建了一个jar-with-dependencies。但是这会导致整个jar进入pigContext.skipJars(因为它包含pig.jar本身)并且没有被发送到hadoop服务器:($ / b>

什么是正确的方法我们必须手动为每个我们依赖的jar调用registerJar吗?

解决方案

不知道什么是认证方式,但这里有一些指针:当使用 pigServer.registerFunction 时,


  • 猪会自动检测到包含udfs并将其发送到jobTracker

  • pig还会自动检测包含PigMapReduce类( JarManager.createJar )的jar,并从它仅以 org / apache / pig org / antlr / runtime >等,并将它们发送到jobTracker以及
  • 所以,如果您的UDF与 PigMapReduce 在同一个jar中,因为它不会被发送

  • 我们的结论: don' t使用jar -with-dependencies



HTH


when you use pigServer.registerFunction, you're not supposed to explicitly call pigServer.registerJar, but rather have pig automatically detect the jar using jarManager.findContainingJar.

However, we have a complex UDF who's class is dependent on other classes from multiple jars. So we created a jar-with-dependencies with the maven-assembly. But this causes the entire jar to enter pigContext.skipJars (as it contains the pig.jar itself) and not being sent to the hadoop server :(

What's the correct approach here? Must we manually call registerJar for every jar we depend on?

解决方案

not sure what's the certified way, but here's some pointers:

  • when you use pigServer.registerFunction pig automatically detects the jar that contain the udfs and sends it to the jobTracker
  • pig also automatically detects the jar that contains PigMapReduce class (JarManager.createJar), and extracts from it only the classes that start with org/apache/pig, org/antlr/runtime, etc. and sends them to the jobTracker as well
  • so, if your UDF sits in the same jar as PigMapReduce your'e screwed, because it won't get sent
  • our conclusion: don't use jar-with-dependencies

HTH

这篇关于嵌入式hadoop-pig:对UDF使用自动addContainingJar的正确方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆