Pig - 删除换行、返回和制表符 [英] Pig - Remove line feed, return and tab

查看:40
本文介绍了Pig - 删除换行、返回和制表符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从 Pig 的列中删除字符:\n、\t 和 \r,但我得到了错误的输出.

I'm trying to remove the characters: \n, \t and \r from a column in Pig but I'm getting the wrong output.

这是我正在做的:

qr_1 = LOAD 'hdfs://localhost:9000/sample.csv' USING PigStorage(',') as (Id:int,PostTypeId:int,AcceptedAnswerId:int,ParentId:int,CreationDate:chararray,DeletionDate:chararray,Score:int,ViewCount:int,Body:chararray,OwnerUserId:int,OwnerDisplayName:chararray,LastEditorUserId:int,LastEditorDisplayName:chararray,LastEditDate:chararray,LastActivityDate:chararray,Title:chararray,Tags:chararray,AnswerCount:int,CommentCount:int,FavoriteCount:int,ClosedDate:chararray,CommunityOwnedDate:chararray);
qr_1 = FOREACH qr_1 GENERATE Id .. ViewCount, REPLACE(Body,'\n','') as Body, OwnerUserId .. ;
qr_1 = FOREACH qr_1 GENERATE Id .. ViewCount, REPLACE(Body,'\r','') as Body, OwnerUserId .. ;   
qr_1 = FOREACH qr_1 GENERATE Id .. ViewCount, REPLACE(Body,'\t','') as Body, OwnerUserId .. ;   

输入:

5585779,1,5585800,,2011-04-07 18:27:54,,1432,3090250,"<p>How can I convert a <code>String</code> to an <code>int</code> in Java?</p>

<p>My String contains only numbers and I want to return the number it represents.</p>

<p>For example, given the string <code>""""1234""""</code> the result should be the number <code>1234</code>.</p>",537967,,2756409,user166390,2015-09-10 21:30:42,2016-03-07 00:42:49,Converting String to Int in Java?,<java><string><type-conversion>,12,0,239

输出:

(5585779,1,5585800,,2011-04-07 18:27:54,,1432,3090250,"<p>How can I convert a <code>String</code> to an <code>int</code> in Java?</p>,,,,,,,,,,,,,)
(,,,,,,,,,,,,,,,,,,,,,)
(,,,,,,,,,,,,,,,,,,,,)
(,,,,,,,,,,,,,,,,,,,,,)
(,,537967,,2756409,user166390,,,Converting String to Int in Java?,,12,0,239,,,,,,,,,)

我在做什么?

谢谢.

\\n"也没有区别.

推荐答案

您的数据中有逗号,这就是字段和架构不匹配的原因.使用 CSVLoader 然后使用REPLACE命令替换'\\t','\\n','\\r'

There is comma in your data and that's why the fields and the schema are not matching.Use CSVLoader and then use the REPLACE command to replace '\\t','\\n','\\r'

<p>For example, given the string

这篇关于Pig - 删除换行、返回和制表符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆