Apache Pig 处理带有引号的字段的 CSV [英] Apache Pig process CSV with fields wrapped in quotes
问题描述
如何处理某些字段用引号括起来的 CSV 文件?
How I can process CSV file where some fields are wrapped in quotes?
例如要处理的行(字段分隔符为',')
Line to process for example (field delimiter is ',')
我是 column1,我是 column2,是的,我是 column3"
I am column1, I am column2, "yes, I'm am column3"
该示例包含三列.但是下面的例子会说我有四列:
The example has three columns. But the following example will say that I have four columns:
A = 使用 PigStorage(',') 加载 '/path/to/file';
A = load '/path/to/file' using PigStorage(',');
请提供任何建议,资源链接..?
Please, any suggestions, link to resource..?
推荐答案
尝试加载数据,然后执行 FOREACH GENERATE 以将数据重新生成为您需要的任何格式.对于需要删除引号的字段,请使用 REPLACE($3, '\"').
Try loading the data, then do a FOREACH GENERATE to regenerate the data into whatever format you need. For the fields where you need to remove the quotes, use a REPLACE($3, '\"').
data = LOAD 'testdata' USING PigStorage(",");
data = FOREACH data GENERATE
(chararray) $0 AS col1:chararray,
(chararray) $1 AS col2:chararray,
(chararray) REPLACE($3, '\"') AS col3:chararray);
这篇关于Apache Pig 处理带有引号的字段的 CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!