我怎么能忽略“ (双引号),而在PIG中加载文件? [英] how can i ignore " (double quotes) while loading file in PIG?

查看:152
本文介绍了我怎么能忽略“ (双引号),而在PIG中加载文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

 a,b,1,2
a,b,4,3
a,b,3,1

我正在使用下面的命令阅读这个文件

  File1 = LOAD '/ path'使用PigStorage(',')作为(f1:chararray,f2:chararray,f3:int,f4:int)

但是这里忽略了字段3和4的数据



如何正确读取此文件或以任何方式使PIG略过'''



我使用Apache Pig 0.10.0版的其他信息

解决方案

您可以使用 REPLACE 函数(它不会一次通过):

  file1 =使用PigStorage(',')加载'your.csv'; 
data = foreach file1生成$ 0为(f1:chararray),$ 1为(f2:chararray),REPLACE($ 2,'\\\ \\', )如(F3:INT),REPLACE($ 3\\,'')为(F4:int);在

您也可以在 REGEX_EXTRACT

  file1 =使用PigStorage(',')加载'your.csv'; 
data = foreach file1生成$ 0,$ 1,REGEX_EXTRACT($ 2,'([0-9] +)',1),REGEX_EXTRACT($ 3,'([0-9] +)',1);

当然,您可以清除 f1和f2的方式相同。


I have following data in file

"a","b","1","2"
"a","b","4","3"
"a","b","3","1"

I am reading this file using below command

File1 = LOAD '/path' using PigStorage (',') as (f1:chararray,f2:chararray,f3:int,f4:int)

But here it is ignoring the data of field 3 and 4

How to read this file correctly or any way to make PIG skip '"'

Additional information i am using Apache Pig version 0.10.0

解决方案

You may use the REPLACE function (it won't be in one pass though) :

file1 = load 'your.csv' using PigStorage(',');
data = foreach file1 generate $0 as (f1:chararray), $1 as (f2:chararray), REPLACE($2, '\\"', '') as (f3:int), REPLACE($3, '\\"', '') as (f4:int);

You may also use regexes with REGEX_EXTRACT :

file1 = load 'your.csv' using PigStorage(',');
data = foreach file1 generate $0, $1, REGEX_EXTRACT($2, '([0-9]+)', 1), REGEX_EXTRACT($3, '([0-9]+)', 1);

Of course, you could erase " for f1 and f2 the same way.

这篇关于我怎么能忽略“ (双引号),而在PIG中加载文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆