指定Hadoop进程拆分 [英] Specify Hadoop process split

查看:152
本文介绍了指定Hadoop进程拆分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我的任务之一是失败。我可以在日志中看到:


处理split:hdfs:// localhost:8020 / user / martin / history / history.xml: 3556769792 + 67108864

我可以在偏移3556769792到3623878656(3556769792 + 67108864)的文件上再次执行MapReduce吗?

解决方案

一种方法是从offset定义文件复制文件并将其添加回HDFS。从这一点开始,只需在该块上运行mapreduce作业即可。

1)从偏移量3556769792复制文件67108864:


dd if = history.xml bs = 1 skip = 3556769792 count = 67108864>
history_offset.xml

2)导入HDFS


hadoop fs -copyFromLocal history_offset.xml offset / history_offset.xml




<3>再次运行MapReduce


$ b


hadoop jar myJar。 jar'offset''offset_output'



I want to run Hadoop MapReduce on a small part of my text file.

One of my task is failing. I can read in the log:

Processing split: hdfs://localhost:8020/user/martin/history/history.xml:3556769792+67108864

Can I execute once again MapReduce on this file from offset 3556769792 to 3623878656 (3556769792+67108864) ?

解决方案

A way to do is to copy the file from the offset define and add it back into HDFS. From this point simply run the mapreduce job only on this block.

1) copy file from offset 3556769792 follow by 67108864:

dd if=history.xml bs=1 skip=3556769792 count=67108864 > history_offset.xml

2) import into HDFS

hadoop fs -copyFromLocal history_offset.xml offset/history_offset.xml

3) run again MapReduce

hadoop jar myJar.jar 'offset' 'offset_output'

这篇关于指定Hadoop进程拆分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆