猪第三方UDF澄清 [英] Pig 3rd party UDF clarification

查看:86
本文介绍了猪第三方UDF澄清的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是PIG的新手.从Pig Wiki页面上,我了解到有piggybank udf和来自Linkedin的另一个有用的集合DataFu.我也知道从Pig 0.8开始,储钱罐是Apache Pig内置udfs的一部分.

I am new to PIG. From the pig wiki page i got to know that there is piggybank udf and another useful collection DataFu from Linkedin. Also i come to know that from Pig 0.8 the piggybank is part of apache Pig's builtin udfs.

但是..我认为大多数储钱UDF都没有在Apache Pig中记录.就像StringConcat.

but.. I think most of the piggybank UDFs are not documented in Apache Pig. Like StringConcat.

我正在寻找一些日期格式UDF,这些日期格式UDF会将datetime转换为String,例如FormatDate. 我不确定这些UDF是否已经存在于pig/piggybank中,因为我在文档中找不到它.

I am looking some date formatting UDFs which wil convert datetime to String like FormatDate. I am not sure we have these UDF's already in pig/piggybank as i could not find it in documentation.

此外,还有其他任何第三方udfs java/python.请列出这些.

Also, are there any other 3rd party udfs java/python available. Please list those.

非常感谢您的帮助.

推荐答案

所以这里有一些问题.我会尽力涵盖所有内容.

So there's a few questions here. I'll try to cover them all.

PiggyBank文档

PiggyBank Docs

(不幸的)没有用户手动存钱罐UDF,它解释了如何从一个脚本中使用它们.但是,Pig Javadoc包含有关在存钱罐中实现UDF的每个Java cass的信息(向下滚动至"contrib:Piggybank"):

There (sadly) is no user manually for piggybank UDF's that explains how to use each of them from within a pigscript. However, the Pig javadoc includes information for each java cass implementing the UDFs in piggy bank (scroll down to "contrib: Piggybank"):

  • http://pig.apache.org/docs/r0.8.1/api/overview-summary.html
  • http://pig.apache.org/docs/r0.9.1/api/overview-summary.html
  • http://pig.apache.org/docs/r0.10.0/api/overview-summary.html

日期时间的字符串

String to DateTime

(假设猪<0.11)

要转换包含类似时间的信息的字符串,您需要使用

To convert a string containing time like information, you'll want to use the CustomFormatToISO UDF. This takes your chararray with data information and a datetime format specification and converts it into an ISO datetime format. Once in this format, there are several Piggybank functions that operate on ISO formatted time:

  • http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/piggybank/evaluation/datetime/truncate/package-summary.html
  • http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/piggybank/evaluation/datetime/diff/package-summary.html

还请注意,ISO格式的字符串比较会导致日期排序.这意味着您可以对它们应用比较和排序操作,并且它们的行为就好像它们是时间感知的一样.有关更多背景信息,请参见以下SO答案: https://stackoverflow.com/a/9576911/9940

Note also that ISO formatted strings comparisons result in date sorting. This means you can apply comparison and sort operations on them, and they will behave as if they are time aware. For more background see this SO answer: https://stackoverflow.com/a/9576911/9940

如果您使用的是0.11或更高版本,则可以使用内置的ToDate()函数: http://pig.apache.org/docs/r0.11.1/func.html#to-date

If you're using 0.11 plus you can use the built in ToDate() function: http://pig.apache.org/docs/r0.11.1/func.html#to-date

这篇关于猪第三方UDF澄清的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆