从 R 中的并行进程写入文件时锁定文件 [英] Lock file when writing to it from parallel processes in R

查看:62
本文介绍了从 R 中的并行进程写入文件时锁定文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 R 中 parallel 包中的 parSapply().我需要对大量数据执行计算.即使并行执行也需要几个小时,所以我决定使用 write.table() 定期将结果从集群写入文件,因为当内存不足或其他一些随机原因,我想从它停止的地方继续计算.我注意到我得到的一些 csv 文件行只是在中间被剪掉了,可能是因为多个进程同时写入文件.有没有办法在 write.table() 执行时锁定文件的时间,以便其他集群无法访问它或者唯一的出路是将文件与每个文件分开聚类然后合并结果?

I use parSapply() from parallel package in R. I need to perform calculations on huge amount of data. Even in parallel it takes hours to execute, so I decided to regularly write results to a file from clusters using write.table(), because the process crashes from time to time when running out of memory or for some other random reason and I want to continue calculations from the place it stopped. I noticed that some lines of csv files that I get are just cut in the middle, probably as a result of several processes writing to the file at the same time. Is there a way to place a lock on the file for the time while write.table() executes, so other clusters can't access it or the only way out is to write to separate file from each cluster and then merge the results?

推荐答案

现在可以使用 filelock (GitHub)

为了使用 parSapply() 促进这一点,您需要编辑循环,以便如果文件被锁定,进程不会简单地退出,而是再试一次或 Sys.sleep() 短时间.但是,我不确定这会如何影响您的表现.

In order to facilitate this with parSapply() you would need to edit your loop so that if the file is locked the process will not simply quit, but either try again or Sys.sleep() for a short amount of time. However, I am not certain how this will affect your performance.

相反,我建议您创建可以保存数据的特定于集群的文件,从而消除对锁定文件的需要,并且不会降低您的性能.之后,您应该能够编织这些文件并创建最终结果文件.如果大小是一个问题,那么您可以使用 disk.frame 来处理大于系统 RAM 的文件.

Instead I recommend you create cluster-specific files that can hold your data, eliminating the need for a lock file and not reducing your performance. Afterwards you should be able to weave these files and create your final results file. If size is an issue then you can use disk.frame to work with files that are larger than your system RAM.

这篇关于从 R 中的并行进程写入文件时锁定文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆