当任务被杀死时,如何告诉Hadoop不要从HDFS中删除临时目录? [英] How to tell Hadoop to not delete temporary directory from HDFS when task is killed?

查看:280
本文介绍了当任务被杀死时,如何告诉Hadoop不要从HDFS中删除临时目录?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

默认情况下,Hadoop映射任务将已处理的记录写入${mapred.output.dir}/_temporary/_${taskid}处的临时目录中的文件中.这些文件坐在这里,直到FileCommiter将它们移动到${mapred.output.dir}(任务成功完成之后).我遇到的情况是,在map任务的setup()中,我需要在上面提供的临时目录下创建文件,在其中我写一些与过程相关的数据,这些数据稍后将用于其他地方.但是,当hadoop任务被杀死时,将从HDFS中删除临时目录.

By default, hadoop map tasks write processed records to files in temporary directory at ${mapred.output.dir}/_temporary/_${taskid} . These files sit here until FileCommiter moves them to ${mapred.output.dir} (after task successfully finishes). I have case where in setup() of map task I need to create files under above provided temporary directory, where I write some process related data used later somewhere else. However, when hadoop tasks are killed, temporary directory is removed from HDFS.

任何人都知道是否有可能告诉Hadoop在任务被杀死后不要删除该目录,以及如何实现?我猜应该提供一些我可以配置的属性.

Anyone knows if it is possible to tell Hadoop to not delete this directory after task is killed, and how to achieve that? I guess some property should be provided that I can configure.

致谢

推荐答案

依赖临时文件不是一个好习惯,临时文件的位置和格式可以在发行版之间随时更改.

It's not a good practice to depend on temporary files, whose location and format can change anytime between releases.

无论如何,将mapreduce.task.files.preserve.failedtasks设置为true将保留所有失败任务的临时文件,将mapreduce.task.files.preserve.filepattern设置为任务ID的正则表达式将保留匹配模式的临时文件,而与任务成功无关或失败.

Anyway, setting mapreduce.task.files.preserve.failedtasks to true will keep the temporary files for all the failed tasks and setting mapreduce.task.files.preserve.filepattern to regex of the ID of the task will keep the temporary files for the matching pattern irrespective of the task success or failure.

这篇关于当任务被杀死时,如何告诉Hadoop不要从HDFS中删除临时目录?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆