如何阅读 Pig 中的包列表? [英] How do I read in a list of bags in Pig?

查看:27
本文介绍了如何阅读 Pig 中的包列表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在 Pig 中读取包列表?

How do I read in a list of bags in Pig?

我试过了:

grunt> cat sample.txt
{a,b},{},{c,d}
grunt> data = LOAD 'sample.txt' AS (a:bag{}, b:bag{}, c:bag{});
grunt> DUMP data
({},,)

推荐答案

将数据读入 Pig 的默认方法是 PigStorage('\t') -- 也就是说,它假定你的数据是制表符分隔.你的是逗号分隔的.所以你应该写 LOAD 'sample.txt' USING PigStorage(',') AS....

The default method for reading data into Pig is PigStorage('\t') -- that is, it assumes your data is tab-separated. Yours is comma-separated. So you should write LOAD 'sample.txt' USING PigStorage(',') AS....

但是,您的数据不是正确的 Pig bag 格式.请记住,包是元组的集合.如果您无法预处理您的输入,您将必须编写一个 UDF 来解析您提供的表单的输入.所以这应该起作用:

However, your data is not in proper Pig bag format. Remember that a bag is a collection of tuples. If you cannot pre-process your input, you'll have to write a UDF to parse input of the form you have given. So this ought to work:

grunt> cat tmp/data.txt
{(a),(b)},{},{(c),(d)}
grunt> data = LOAD 'tmp/data.txt' USING PigStorage(',') AS (a:bag{}, b:bag{}, c:bag{});
grunt> DUMP data;
(,,{})

出了什么问题?您的输入字段分隔符 (,) 与包记录分隔符相同这一事实让 Pig 感到困惑.它将您的输入解析为字段 {(a)(b)}{},这就是为什么只有第三个字段结束成为一个袋子.这就是为什么您会看到类似遇到警告 FIELD_DISCARDED_TYPE_CONVERSION_FAILED 2 次的警告消息.

What went wrong? The fact that your input field separator (,) is the same as the bag-record separator is confusing Pig. It parses your input into the fields {(a), (b)}, and {}, which is why only the third field ends up being a bag. It's why you'll see a warning message like Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 2 time(s).

如果可以,请尝试使用制表符或空格(或分号,或...)代替逗号:

If you can, try to use tabs or spaces (or semicolons, or...) instead of commas:

grunt> cat tmp/data.txt                                                                
{(a),(b)}       {}      {(c),(d)}
grunt> data = LOAD 'tmp/data.txt' AS (a:bag{}, b:bag{}, c:bag{});                      
grunt> DUMP data;
({(a),(b)},{},{(c),(d)})

这篇关于如何阅读 Pig 中的包列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆