指定Hadoop进程拆分 [英] Specify Hadoop process split
问题描述
我的任务之一是失败。我可以在日志中看到:
处理split:hdfs:// localhost:8020 / user / martin / history / history.xml: 3556769792 + 67108864
我可以在偏移3556769792到3623878656(3556769792 + 67108864)的文件上再次执行MapReduce吗?
一种方法是从offset定义文件复制文件并将其添加回HDFS。从这一点开始,只需在该块上运行mapreduce作业即可。
1)从偏移量3556769792复制文件67108864:
dd if = history.xml bs = 1 skip = 3556769792 count = 67108864>
history_offset.xml
2)导入HDFS
hadoop fs -copyFromLocal history_offset.xml offset / history_offset.xml
<3>再次运行MapReduce
$ b
hadoop jar myJar。 jar'offset''offset_output'
I want to run Hadoop MapReduce on a small part of my text file.
One of my task is failing. I can read in the log:
Processing split: hdfs://localhost:8020/user/martin/history/history.xml:3556769792+67108864
Can I execute once again MapReduce on this file from offset 3556769792 to 3623878656 (3556769792+67108864) ?
A way to do is to copy the file from the offset define and add it back into HDFS. From this point simply run the mapreduce job only on this block.
1) copy file from offset 3556769792 follow by 67108864:
dd if=history.xml bs=1 skip=3556769792 count=67108864 > history_offset.xml
2) import into HDFS
hadoop fs -copyFromLocal history_offset.xml offset/history_offset.xml
3) run again MapReduce
hadoop jar myJar.jar 'offset' 'offset_output'
这篇关于指定Hadoop进程拆分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!