是否有一个始终可写的持久位置,可以被程序包用作数据缓存? [英] Is there a persistent location that is always writable which can be used as data cache by a package?

查看:92
本文介绍了是否有一个始终可写的持久位置,可以被程序包用作数据缓存?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

R包是否可以在其中存储高速缓存数据的预定义位置?数据应跨会话保留.我当时正在考虑创建${R_LIBS_USER}/package_name的子目录,但是我不确定这是否可移植,并且如果我的软件包已在系统范围内安装,则是否允许"".

Is there a predefined location where an R package could store cached data? The data should persist across sessions. I was thinking about creating a subdirectory of ${R_LIBS_USER}/package_name, but I'm not sure if this is portable and if this is "allowed" if my package is installed systemwide.

想法如下:在包的data子目录中创建一个R脚本mydata.R,可以通过调用data(mydata)(根据data()的文档)执行该脚本.如果以前没有缓存过,则此脚本将从互联网上加载数据并进行缓存. (如果已经缓存了数据,则将使用缓存.)此外,还将提供一个功能来使缓存无效和/或检查是否有联机的较新版本的数据.

The idea is the following: Create an R script mydata.R in the data subdirectory of the package which would be executed by calling data(mydata) (according to the documentation of data()). This script would load the data from the internet and cache it, if it hasn't been cached before. (If the data has been cached already, the cache will be used.) In addition, a function will be provided to invalidate the cache and/or to check if a newer version of the data is available online.

这来自data()的文档:

当前,支持四种格式的数据文件:

Currently, four formats of data files are supported:

    以'.R'或'.r'结尾的
  1. 个文件是source()d,其中R工作目录临时更改为包含相应文件的目录. (如果已通过utils :: data运行,则数据可确保已附加utils软件包.)

  1. files ending ‘.R’ or ‘.r’ are source()d in, with the R working directory changed temporarily to the directory containing the respective file. (data ensures that the utils package is attached, in case it had been run via utils::data.)

...

实际上,在具有以下内容的程序包的data子目录中创建文件fortytwo.R:

Indeed, creating a file fortytwo.R in the data subdirectory of a package with the following contents:

fortytwo = data.frame(answer=42)

,然后执行data(fortytwo),将创建​​一个数据帧变量fortytwo.现在的问题是:如果难以计算,fortytwo.R会在哪里缓存数据?

and then executing data(fortytwo) creates a data frame variable fortytwo. Now the question is: Where would fortytwo.R cache the data if it were difficult to compute?

编辑:我正在考虑创建两个程序包:提供数据的数据"程序包和对其进行操作的代码"程序包.这个问题涉及数据"包:它可以在每个用户的存储中存储文件,以便在R会话之间保持持久性,并可以从不同的R项目访问该文件?

EDIT: I am thinking about creating two packages: A "data" package that provides the data, and a "code" package that operates on it. The question concerns the "data" package: Where can it store files in a per-user storage so that it is persistent across R sessions and is accessible from different R projects?

相关:在安装过程中从互联网下载数据的软件包.

推荐答案

R中没有为特定于软件包的持久性缓存绝对定义的位置.但是,

There is no absolutely defined location for package-specific persistent caching in R. However, the R.cache package provides an interface for creating and managing cached data. It looks like it could be useful for your scenario.

用户加载R.cache(library(R.cache))时,会出现以下提示:

When users load R.cache (library(R.cache)), they get the following prompt:

The R.cache package needs to create a directory that will hold cache files.
It is convenient to use one in the user's home directory, because it remains
also after restarting R. Do you wish to create the '~/.Rcache/' directory? If
not, a temporary directory (/tmp/RtmpqdUcbP/.Rcache) that is specific to this
R session will be used. [Y/n]:

然后,他们可以选择在其主目录(可能是持久性的)中创建缓存目录,或创建特定于会话的目录.如果使数据包依赖于R.cache,则可以在其.onLoad()挂钩函数中检查是否存在缓存的对象,如果不存在,则下载数据.另外,您也可以按照自己的问题中建议的方式进行操作.

They can then choose to create the cache directory in their home directory, which is presumably persistent, or to create a session-specific directory. If you make your data package depend on R.cache, you could check for the existence of the cached object(s) in its .onLoad() hook function and download the data if it isn't there. Alternatively, you could do this in the way suggested in your own question.

这篇关于是否有一个始终可写的持久位置,可以被程序包用作数据缓存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆