我如何阅读Pig的行李清单? [英] How do I read in a list of bags in Pig?

查看:74
本文介绍了我如何阅读Pig的行李清单?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我如何阅读Pig的行李清单?

How do I read in a list of bags in Pig?

我尝试过:

grunt> cat sample.txt
{a,b},{},{c,d}
grunt> data = LOAD 'sample.txt' AS (a:bag{}, b:bag{}, c:bag{});
grunt> DUMP data
({},,)

推荐答案

将数据读入Pig的默认方法是PigStorage('\t'),也就是说,假定您的数据是制表符分隔的.您的用逗号分隔.所以你应该写LOAD 'sample.txt' USING PigStorage(',') AS....

The default method for reading data into Pig is PigStorage('\t') -- that is, it assumes your data is tab-separated. Yours is comma-separated. So you should write LOAD 'sample.txt' USING PigStorage(',') AS....

但是,您的数据不是正确的Pig bag格式.请记住,包是元组的集合.如果无法预处理输入,则必须编写UDF来解析输入形式的输入.因此,应该可以正常工作:

However, your data is not in proper Pig bag format. Remember that a bag is a collection of tuples. If you cannot pre-process your input, you'll have to write a UDF to parse input of the form you have given. So this ought to work:

grunt> cat tmp/data.txt
{(a),(b)},{},{(c),(d)}
grunt> data = LOAD 'tmp/data.txt' USING PigStorage(',') AS (a:bag{}, b:bag{}, c:bag{});
grunt> DUMP data;
(,,{})

出了什么问题?输入字段分隔符(,)与bag-record分隔符相同的事实使Pig感到困惑.它将您的输入解析为{(a)(b)}{}字段,这就是为什么只有第三个字段最终成为bag的原因.这就是为什么您会看到类似Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 2 time(s)的警告消息的原因.

What went wrong? The fact that your input field separator (,) is the same as the bag-record separator is confusing Pig. It parses your input into the fields {(a), (b)}, and {}, which is why only the third field ends up being a bag. It's why you'll see a warning message like Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 2 time(s).

如果可以,请尝试使用制表符或空格(或分号或...)代替逗号:

If you can, try to use tabs or spaces (or semicolons, or...) instead of commas:

grunt> cat tmp/data.txt                                                                
{(a),(b)}       {}      {(c),(d)}
grunt> data = LOAD 'tmp/data.txt' AS (a:bag{}, b:bag{}, c:bag{});                      
grunt> DUMP data;
({(a),(b)},{},{(c),(d)})

这篇关于我如何阅读Pig的行李清单?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆