Pig中的Python UDF [英] Python UDFs in Pig

查看:192
本文介绍了Pig中的Python UDF的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在这里看过 ,但我承认我的感受它相当缺乏。我想知道是否有人可以给我一些关于将Python UDF合并到Pig中的例子。特别是

I've seen the documentatio here, but I confess that I feel it rather lacking. I was wondering if anyone could give me collection of examples as to incorporating Python UDFs into Pig. In particular


  • 在Pig 0.10之前,布尔类型不存在,但是 FILTER 操作需要将结果解析为布尔值。我永远诅咒着返回 1 0 并使用 FILTER alias BY py_udf.f(字段)> 0 如果我没有最新版本?

  • 代数 Accumulator Filter 从Python无法访问的接口

  • 我是否可以不访问分布式缓存?

  • 储存/加载功能如何?

  • Prior to Pig 0.10, the boolean type does not exist, but a FILTER operation requires the result resolve to a boolean. Am I forever cursed with returning 1 or 0 and using FILTER alias BY py_udf.f(field) > 0 if I don't have the latest version?
  • Are the Algebraic, Accumulator, and Filter interfaces inaccessible from Python?
  • Can I not access the Distributed Cache either?
  • What about Store/Load functions?

推荐答案

< Python UDF非常有限。您不能使用代数或累加器接口,也不能使用Python编写LoadFunc。对于比映射操作更复杂的任何事情,您可能需要使用Java UDF。

Python UDFs are quite limited. You cannot use Algebraic or Accumulator interfaces, nor can you write a LoadFunc in Python. For anything more complicated than a map operation you will likely need to resort to a Java UDF.

也就是说,具有动态outputSchema的更复杂的Python UDF可以在 http:// ragrawal。 wordpress.com/2013/02/24/on-writing-python-udf-for-pig-a-perspective/ 。这可能不会对您有所帮助,但它会让您更好地理解Python UDF可以执行的操作。

That said, a more complex Python UDF with a dynamic outputSchema can be found at http://ragrawal.wordpress.com/2013/02/24/on-writing-python-udf-for-pig-a-perspective/. This likely won't help you, but it will give you a better understanding of what Python UDFs can do.

这篇关于Pig中的Python UDF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆