Pig 中的 Python UDF [英] Python UDFs in Pig
问题描述
我在这里看过文档,但我承认我觉得它比较缺乏.我想知道是否有人可以给我收集有关将 Python UDF 合并到 Pig 中的示例.特别是
I've seen the documentatio here, but I confess that I feel it rather lacking. I was wondering if anyone could give me collection of examples as to incorporating Python UDFs into Pig. In particular
- 在 Pig 0.10 之前,布尔类型不存在,但
FILTER
操作需要将结果解析为布尔值.我是否永远诅咒返回1
或0
并使用FILTER alias BY py_udf.f(field) >0
如果我没有最新版本? Algebraic
、Accumulator
和Filter
接口是否无法从 Python 访问?- 我也不能访问分布式缓存吗?
- 存储/加载函数呢?
- Prior to Pig 0.10, the boolean type does not exist, but a
FILTER
operation requires the result resolve to a boolean. Am I forever cursed with returning1
or0
and usingFILTER alias BY py_udf.f(field) > 0
if I don't have the latest version? - Are the
Algebraic
,Accumulator
, andFilter
interfaces inaccessible from Python? - Can I not access the Distributed Cache either?
- What about Store/Load functions?
推荐答案
Python UDF 非常有限.您不能使用 Algebraic 或 Accumulator 接口,也不能用 Python 编写 LoadFunc.对于比地图操作更复杂的任何事情,您可能需要求助于 Java UDF.
Python UDFs are quite limited. You cannot use Algebraic or Accumulator interfaces, nor can you write a LoadFunc in Python. For anything more complicated than a map operation you will likely need to resort to a Java UDF.
也就是说,可以在 http://ragrawal.wordpress.com/2013/02/24/on-writing-python-udf-for-pig-a-perspective/.这可能对您没有帮助,但可以让您更好地了解 Python UDF 的功能.
That said, a more complex Python UDF with a dynamic outputSchema can be found at http://ragrawal.wordpress.com/2013/02/24/on-writing-python-udf-for-pig-a-perspective/. This likely won't help you, but it will give you a better understanding of what Python UDFs can do.
这篇关于Pig 中的 Python UDF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!