阿帕奇星火不会删除临时目录 [英] Apache Spark does not delete temporary directories

查看:120
本文介绍了阿帕奇星火不会删除临时目录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

后,有3个临时目录留在临时目录。
该目录名称是这样的:火花2e389487-40cc-4a82-a5c7-353c0feefbb7

After a spark program completes, there are 3 temporary directories remain in the temp directory. The directory names are like this: spark-2e389487-40cc-4a82-a5c7-353c0feefbb7

该目录是空的。

而当星火计划在Windows上运行,一个活泼DLL文件也仍然在临时目录中。
文件名是这样的:活泼-1.0.4.1-6e117df4-97b6-4d69-bf9d-71c4a627940c-snappyjava

And when the Spark program runs on Windows, a snappy DLL file also remains in the temp directory. The file name is like this: snappy-1.0.4.1-6e117df4-97b6-4d69-bf9d-71c4a627940c-snappyjava

他们创建的每个星火程序运行时间。因此,文件和目录的数量一直在增长。

They are created every time the Spark program runs. So the number of files and directories keeps growing.

如何才能让他们删除?

星火版本1.3.1用Hadoop 2.6。

Spark version is 1.3.1 with Hadoop 2.6.

更新

我跟踪的火花源$ C ​​$ C。

I've traced the spark source code.

是创建3临时目录中的模块方法如下:

The module methods that create the 3 'temp' directories are as follows:


  • DiskBlockManager.createLocalDirs

  • HttpFileServer.initialize

  • SparkEnv.sparkFilesDir

他们(最终)调用Utils.getOrCreateLocalRootDirs然后Utils.createDirectory,它故意不标注为自动删除的目录。

They (eventually) call Utils.getOrCreateLocalRootDirs and then Utils.createDirectory, which intentionally does NOT mark the directory for automatic deletion.

createDirectory方法的评论说:目录是保证
新创建的,并且没有标记为自动删除。

The comment of createDirectory method says: "The directory is guaranteed to be newly created, and is not marked for automatic deletion."

我不知道为什么他们都没有标注。这真的是故意的吗?

I don't know why they are not marked. Is this really intentional?

推荐答案

SPARK_WORKER_OPTS 存在支持工人应用程序文件夹清理的复制在这里作进一步参考的:从星火文件

Three SPARK_WORKER_OPTS exists to support the worker application folder cleanup, copied here for further reference: from Spark Doc


  • spark.worker.cleanup.enabled ,默认值为,允许工人定期清理/应用程序目录。请注意,这只会影响独立模式下,纱的工作方式不同。停止应用程序只有目录清理。

  • spark.worker.cleanup.enabled, default value is false, Enable periodic cleanup of worker / application directories. Note that this only affects standalone mode, as YARN works differently. Only the directories of stopped applications are cleaned up.

spark.worker.cleanup.interval ,默认值是1800,即30分钟后,控制的时间间隔,以秒为工人清理旧的应用程序在本地机器上运行显示目录。

spark.worker.cleanup.interval, default is 1800, i.e. 30 minutes, Controls the interval, in seconds, at which the worker cleans up old application work dirs on the local machine.

spark.worker.cleanup.appDataTtl ,默认值是7 * 24 * 3600(7天),几秒钟保留在每个应用程序的工作目录的数量工人。这是一个生存时间,应该取决于你有可用的磁盘空间量。应用程序日志和罐子被下载到每个应用程序的工作目录。随着时间的推移,工作迪尔斯可以迅速填满磁盘空间,特别是如果你运行的作业非常频繁。

spark.worker.cleanup.appDataTtl, default is 7*24*3600 (7 days), The number of seconds to retain application work directories on each worker. This is a Time To Live and should depend on the amount of available disk space you have. Application logs and jars are downloaded to each application work dir. Over time, the work dirs can quickly fill up disk space, especially if you run jobs very frequently.

这篇关于阿帕奇星火不会删除临时目录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆