hadoop-streaming.jar在每行末尾添加x'09' [英] hadoop-streaming.jar adds x'09' at the end of each line

查看:77
本文介绍了hadoop-streaming.jar在每行末尾添加x'09'的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用以下hadoop-streaming.jar命令在HDFS位置合并一些* _0(HDFS中的部分文件)文件.

I am trying to merge some *_0 (part files in HDFS) files in a HDFS location using the below hadoop-streaming.jar command.

  hadoop jar $HDPHOME/hadoop-streaming.jar -Dmapred.reduce.tasks=1 -input $INDIR -output $OUTTMP/${OUTFILE}  -mapper cat -reducer cat

一切正常-除此之外,我遇到了问题,因为上述命令的结果似乎将x'09'添加到了每一行的末尾.

Things work fine - Except that, I get into problems, as, the result from above command seem to add x'09' to the end of each line.

我们在零件文件(已被合并文件替换)的顶部定义了Hive表,其中最后一个字段定义为BIGINT.由于合并后的文件将x'09'添加到最后一个字段-现在,对tbale的相同定义在Hue的最后一个字段中显示NULL(因为510408不再是添加了X'09'的数字).

We have Hive tables defined on top of the part files (which are replaced with the merged file) where the last field is defined as BIGINT. Since, the merged file adds the x'09' to the last field - the same definition of the tbale now shows NULL in the last field in Hue (as 510408 is no longer a number as X'09' is added to it).

例如

零件文件中的数据.

00000320  7c 35 31 30 34 30 38 0a                           ||510408.|

合并文件中的数据(上述命令的结果)

Data in merged file (result of above command)

00000320  7c 35 31 30 34 30 38 09  0a                       ||510408..|

如何避免这种情况发生?我可以在命令中设置一些选项来防止这种情况吗?

How do I avoid this from happening? Is there some option that I can set in the command to prevent this?

感谢您的时间来寻求帮助/指针.

Appreciate your time for any help/pointers.

推荐答案

我在添加以下选项似乎可以解决该问题.

Adding the below option seems to resolve it.

-D mapred.textoutputformat.separator=<delimiter-of-input-file>

这篇关于hadoop-streaming.jar在每行末尾添加x'09'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆