带有多个定界符的apache猪负载数据 [英] apache pig load data with multiple delimiters

查看:82
本文介绍了带有多个定界符的apache猪负载数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,我对使用apache pig加载数据有疑问,文件格式如下:

Hi everyone I have a problem about loading data using apache pig, the file format is like:

"1","2","xx,yy","a,sd","3"

所以我想使用多个定界符," 2个双引号和一个逗号来加载它,例如:

So I want to load it by using the multiple delimiter "," 2double quotes and one comma like:

A = LOAD 'file.csv' USING PigStorage('","') AS (f1,f2,f3,f4,f5);

但是PigStorage不接受多个定界符," .我该怎么办?非常感谢你!

but the PigStorage doesn't accept the multiple delimiter ",".How I can do it? Thank you very much!

推荐答案

PigStorage使用单个字符作为分隔符.您将使用 PiggyBank .下载piggybank.jar并保存到您的Pigscript所在的文件夹中.将jar注册到您的Pigscript中.

PigStorage takes single character as delimiter.You will have use builtin functions from PiggyBank. Download piggybank.jar and save in the same folder as your pigscript.Register the jar in your pigscript.

REGISTER piggybank.jar;

DEFINE CSVLoader org.apache.pig.piggybank.storage.CSVLoader();

A = LOAD 'test1.txt' USING CSVLoader(',') AS (f1:int,f2:int,f3:chararray,f4:chararray,f5:int);
B = FOREACH A GENERATE f1,f2,f3,f4,f5;
DUMP B;

另一种选择是将数据加载到一行中,然后使用 STRSPLIT

Alternate option is to load the data into a line and then use STRSPLIT

A = LOAD 'test1.txt' USING TextLoader() AS (line:chararray);
B = FOREACH A GENERATE FLATTEN(STRSPLIT(line, '","'));
DUMP B;

这篇关于带有多个定界符的apache猪负载数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆