Apache Pig 没有完全解析元组 [英] Apache Pig not parsing a tuple fully

查看:31
本文介绍了Apache Pig 没有完全解析元组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个名为 data 的文件,它看起来像这样:(注意personA"后面有标签)

I have a file called data that looks like this: (note there are tabs after the 'personA')

personA (1, 2, 3)
personB (2, 1, 34)

我有一个像这样的 Apache pig 脚本:

And I have an Apache pig script like this:

A = LOAD 'data' AS (name: chararray, nodes: tuple(a:int, b:int, c:int));
C = foreach A generate nodes.$0;
dump C;

其中的输出有意义:

(1)
(2)

但是,如果我将脚本的架构更改为这样:

However if I change the schema of the script to be like this:

A = LOAD 'data' AS (name: chararray, nodes: tuple());
C = foreach A generate nodes.$0;
dump C;

然后我得到的输出是这样的:

Then the output I get is this:

(1, 2, 3)
(2, 1, 34)

看起来这个元组中的第一个(也是唯一一个)元素是一个字节数组.即它不会将输入文本 1, 2, 3 解析为元组.

It looks like the first (and only) element in this tuple is a bytearray. i.e. it's not parsing the input text 1, 2, 3 into a tuple.

将来我的输入将有一个未知的 &nodes 项中的元素数量可变,所以我不能只写出 a:int, ....

In future my input will have an unknown & variable number of elements in the nodes item, so I can't just write out a:int, ….

有没有办法让 Pig 将输入元组解析为元组,而不必写出完整的模式?

Is there anyway to get Pig to parse the input tuple as a tuple without having to write out the full schema?

推荐答案

Pig 不接受您传入的有效内容.默认加载方案 PigStorage 仅接受分隔文件(默认为制表符分隔).用文本中的括号和逗号来解析元组结构是不够聪明的.您的选择是:

Pig does not accept what you are passing in as valid. The default loading scheme PigStorage only accepts delimited files (by default tab delimited). It is not smart enough to parse the tuple construct with the parenthesis and commas you have in the text. Your options are:

  • 重新格式化您的文件以使用制表符分隔:personA 1 2 3
  • 使用 TextLoader,然后编写某种UDF 来解析行并以您想要的形式返回数据.
  • 编写您自己的自定义加载程序.
  • Reformat your file to be tab delimited: personA 1 2 3
  • Read the file in line by line with TextLoader, then write some sort of UDF that parses the line and returns the data in the form you want.
  • Write your own custom loader.

这篇关于Apache Pig 没有完全解析元组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆