我怎么能忽略"(双引号)在 PIG 中加载文件时? [英] how can i ignore " (double quotes) while loading file in PIG?
本文介绍了我怎么能忽略"(双引号)在 PIG 中加载文件时?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我在文件中有以下数据
"a","b","1","2"
"a","b","4","3"
"a","b","3","1"
我正在使用以下命令读取此文件
I am reading this file using below command
File1 = LOAD '/path' using PigStorage (',') as (f1:chararray,f2:chararray,f3:int,f4:int)
但是这里忽略了字段3和4的数据
But here it is ignoring the data of field 3 and 4
如何正确读取此文件或以任何方式使PIG跳过'"'
How to read this file correctly or any way to make PIG skip '"'
其他信息我使用的是 Apache Pig 0.10.0 版
Additional information i am using Apache Pig version 0.10.0
推荐答案
您可以使用 REPLACE
函数(虽然不会一次性完成):
You may use the REPLACE
function (it won't be in one pass though) :
file1 = load 'your.csv' using PigStorage(',');
data = foreach file1 generate $0 as (f1:chararray), $1 as (f2:chararray), REPLACE($2, '\\"', '') as (f3:int), REPLACE($3, '\\"', '') as (f4:int);
您也可以将正则表达式与 REGEX_EXTRACT一起使用代码>
:
You may also use regexes with REGEX_EXTRACT
:
file1 = load 'your.csv' using PigStorage(',');
data = foreach file1 generate $0, $1, REGEX_EXTRACT($2, '([0-9]+)', 1), REGEX_EXTRACT($3, '([0-9]+)', 1);
当然,你可以用同样的方法删除 f1 和 f2 的 "
.
Of course, you could erase "
for f1 and f2 the same way.
这篇关于我怎么能忽略"(双引号)在 PIG 中加载文件时?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文