与包裹在引号领域的Apache猪的过程CSV [英] Apache Pig process CSV with fields wrapped in quotes

查看:140
本文介绍了与包裹在引号领域的Apache猪的过程CSV的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我该如何处理在某些字段被包装在引号的CSV文件?

How I can process CSV file where some fields are wrapped in quotes?

线到例如处理(字段分隔符是',')

Line to process for example (field delimiter is ',')

我COLUMN1,我列2,是的,我很栏3

I am column1, I am column2, "yes, I'm am column3"

这个例子有三列。但是,下面的例子会说,我有四列:

The example has three columns. But the following example will say that I have four columns:

A =负荷'/路径/到/文件'使用PigStorage('');

A = load '/path/to/file' using PigStorage(',');

请,有什么建​​议,链接到资源..?

Please, any suggestions, link to resource..?

推荐答案

尝试加载数据,然后做一个foreach产生对数据重新生成到任何你需要的格式。对于您需要删除引号中的字段,使用REPLACE($ 3,'\\')

Try loading the data, then do a FOREACH GENERATE to regenerate the data into whatever format you need. For the fields where you need to remove the quotes, use a REPLACE($3, '\"').

data = LOAD 'testdata' USING PigStorage(",");
data = FOREACH data GENERATE
    (chararray) $0                AS col1:chararray,
    (chararray) $1                AS col2:chararray,
    (chararray) REPLACE($3, '\"') AS col3:chararray);

这篇关于与包裹在引号领域的Apache猪的过程CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆