Pig 中的 Python UDF [英] Python UDFs in Pig

查看:30
本文介绍了Pig 中的 Python UDF的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在这里看过文档,但我承认我觉得它比较缺乏.我想知道是否有人可以给我收集有关将 Python UDF 合并到 Pig 中的示例.特别是

I've seen the documentatio here, but I confess that I feel it rather lacking. I was wondering if anyone could give me collection of examples as to incorporating Python UDFs into Pig. In particular

  • 在 Pig 0.10 之前,布尔类型不存在,但 FILTER 操作需要将结果解析为布尔值.我是否永远诅咒返回 10 并使用 FILTER alias BY py_udf.f(field) >0 如果我没有最新版本?
  • AlgebraicAccumulatorFilter 接口是否无法从 Python 访问?
  • 我也不能访问分布式缓存吗?
  • 存储/加载函数呢?
  • Prior to Pig 0.10, the boolean type does not exist, but a FILTER operation requires the result resolve to a boolean. Am I forever cursed with returning 1 or 0 and using FILTER alias BY py_udf.f(field) > 0 if I don't have the latest version?
  • Are the Algebraic, Accumulator, and Filter interfaces inaccessible from Python?
  • Can I not access the Distributed Cache either?
  • What about Store/Load functions?

推荐答案

Python UDF 非常有限.您不能使用 Algebraic 或 Accumulator 接口,也不能用 Python 编写 LoadFunc.对于比地图操作更复杂的任何事情,您可能需要求助于 Java UDF.

Python UDFs are quite limited. You cannot use Algebraic or Accumulator interfaces, nor can you write a LoadFunc in Python. For anything more complicated than a map operation you will likely need to resort to a Java UDF.

也就是说,可以在 http://ragrawal.wordpress.com/2013/02/24/on-writing-python-udf-for-pig-a-perspective/.这可能对您没有帮助,但可以让您更好地了解 Python UDF 的功能.

That said, a more complex Python UDF with a dynamic outputSchema can be found at http://ragrawal.wordpress.com/2013/02/24/on-writing-python-udf-for-pig-a-perspective/. This likely won't help you, but it will give you a better understanding of what Python UDFs can do.

这篇关于Pig 中的 Python UDF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆