与包裹在引号领域的Apache猪的过程CSV [英] Apache Pig process CSV with fields wrapped in quotes
问题描述
我该如何处理在某些字段被包装在引号的CSV文件?
How I can process CSV file where some fields are wrapped in quotes?
线到例如处理(字段分隔符是',')
Line to process for example (field delimiter is ',')
我COLUMN1,我列2,是的,我很栏3
I am column1, I am column2, "yes, I'm am column3"
这个例子有三列。但是,下面的例子会说,我有四列:
The example has three columns. But the following example will say that I have four columns:
A =负荷'/路径/到/文件'使用PigStorage('');
A = load '/path/to/file' using PigStorage(',');
请,有什么建议,链接到资源..?
Please, any suggestions, link to resource..?
推荐答案
尝试加载数据,然后做一个foreach产生对数据重新生成到任何你需要的格式。对于您需要删除引号中的字段,使用REPLACE($ 3,'\\')
Try loading the data, then do a FOREACH GENERATE to regenerate the data into whatever format you need. For the fields where you need to remove the quotes, use a REPLACE($3, '\"').
data = LOAD 'testdata' USING PigStorage(",");
data = FOREACH data GENERATE
(chararray) $0 AS col1:chararray,
(chararray) $1 AS col2:chararray,
(chararray) REPLACE($3, '\"') AS col3:chararray);
这篇关于与包裹在引号领域的Apache猪的过程CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!