Apache Pig中的日期时间解析 [英] Datetime parsing in Apache Pig

查看:118
本文介绍了Apache Pig中的日期时间解析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图解析Pig脚本中的日期,但出现以下错误"Hadoop不返回任何错误消息".

I'm trying to parse a Date in a Pig script and i got the following error "Hadoop does not return any error message".

以下是日期格式示例:16年3月9日下午2:50

Here is the Date format example : 3/9/16 2:50 PM

这是我的解析方式:

data = LOAD 'cleaned.txt'
AS (Date, Block, Primary_Type, Description, Location_Description, Arrest, Domestic, District, Year);

times = FOREACH data GENERATE ToDate(Date, 'M/d/yy h:mm a') As Time;

您可以在此处

你有什么主意吗? 谢谢

Do you have any idea ? Thanks

该错误似乎是由时间"上的STORE命令引起的.

It look like the error is caused by the STORE command on "times".

如果我做一个转储,我会得到:

If I do a DUMP then I got:

ERROR 1066: Unable to open iterator for alias times

只有当我使用ToDate函数时,才会发生其他脚本,它们会完美地工作.

It happen only when I use the ToDate function, I have other scripts that work perfectly.

推荐答案

首先,您需要在LOAD语句中指定加载程序:

First of all, you need to specify the loader in the LOAD statement:

USING PigStorage('\t')

我假设您正在使用制表符分隔符. 比起您没有架构,请指定类型的架构!

I assumed that you're using tab separator. Than if you have no schema specify the schema with type!

So you're load statement will be sg like this:
data = LOAD 'SO/date2parse.txt' USING PigStorage('\t') AS (Date:chararray, Block:chararray, Primary_Type:chararray, Description:chararray, Location_Description:chararray, Arrest:chararray, Domestic:chararray, District:chararray, Year:chararray);

目前,我只是将chararray类型用于所有内容,但是您必须指定该类型才是适合您的表示形式.

For now I just use chararray type for everything, but you have to specify the type what is the right representation for you.

此后,日期转换就可以像您写的那样正常工作: (2016-03-09T23:55:00.000Z) (2016-03-09T23:55:00.000Z) (2016-03-09T23:55:00.000Z)

After this the date conversion just works fine as you wrote: (2016-03-09T23:55:00.000Z) (2016-03-09T23:55:00.000Z) (2016-03-09T23:55:00.000Z)

我的测试脚本:

data = LOAD 'SO/date2parse.txt' USING PigStorage('\t') AS (Date:chararray, Block:chararray, Primary_Type:chararray, Description:chararray, Location_Description:chararray, Arrest:chararray, Domestic:chararray, District:chararray, Year:chararray);
times = FOREACH data GENERATE ToDate(Date, 'M/d/yy h:mm a') As Time;
DUMP times;

更新: 一些解释

顺便说一下,默认加载器是存储猪

By the way the default loader is pig storage

PigStorage是LOAD运算符的默认加载功能.

PigStorage is the default load function for the LOAD operator.

但是指定起来更好. 最初的问题是由于缺乏数据类型引起的

but it's nicer to specify. The original issue caused by the lack of datatype

如果您不分配类型,则字段默认为类型bytearray

If you don't assign types, fields default to type bytearray

因此ToDate输入类型失败.

so the ToDate failed on the input type.

这篇关于Apache Pig中的日期时间解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆