更新R包的安全方法-是“热交换"或“热交换".可能的? [英] Safe method for updating R packages - is "hot-swapping" possible?

查看:85
本文介绍了更新R包的安全方法-是“热交换"或“热交换".可能的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经遇到过几次这个问题,除了琐碎的解决方案之外,我找不到其他解决方案.

I have encountered this problem a few times, and am not able to figure out any solution but the trivial one (see below).

假设由于2个以上的用户或1个用户运行多个进程,计算机正在运行2个以上的R实例,并且一个实例执行update.packages().我已经有好几次可以让另一个实例陷入困境.正在更新的软件包不会以任何影响计算的方式更改功能,但不知何故出现了大问题.

Suppose a computer is running 2+ instances of R, due to either 2+ users or 1 user running multiple processes, and one instance executes update.packages(). I've had several times where the other instance can get fouled up big time. The packages being updated don't change functionality in any way that affects computation, but somehow a big problem arises.

简单的解决方案(解决方案0)是在执行update.packages()时终止R的所有实例.这有2个以上的问题.首先,必须终止R实例.其次,人们甚至可能无法识别这些实例在哪里运行(请参阅更新1).

The trivial solution (Solution 0) is to terminate all instances of R while update.packages() executes. This has 2+ problems. First, one has to terminate R instances. Second, one may not even be able to identify where those instances are running (see update 1).

假设执行的代码的行为不会改变(例如,软件包更新都是有益的-它们仅修复错误,提高速度,减少RAM并授予独角兽身份),是否可以通过某种方式热交换新的版本的软件包对其他流程的影响较小?

Assuming that the behavior of the code being executed won't change (e.g. package updates are all beneficial - they only fix bugs, improve speed, reduce RAM, and grant unicorns), is there some way to hot-swap a new version of package with less impact on other processes?

在R之外,我还有两个候选解决方案:

I have two more candidate solutions, outside of R:

解决方案1是使用临时库路径,然后删除旧的旧库,然后将新库移到其位置.这样做的缺点是,删除+移动可能会导致一段时间无法使用任何内容.

Solution 1 is to use a temporary library path and then delete the old old library and move the new one into its place. The drawback of this is that deletes + moves can incur some time during which nothing is available.

解决方案2是使用符号链接指向库(或库层次结构),而只是使用指向更新包所在的新库的指针覆盖符号链接.这似乎导致更少的程序包停机时间-操作系统覆盖符号链接所花费的时间.这样做的缺点是,它在管理符号链接时需要格外小心,并且是特定于平台的.

Solution 2 is to use symlinks to point to a library (or library hierarchy) and just overwrite a symlink with a pointer to a new library where the updated package resides. That seems to incur even less package downtime - the time it takes for the OS to overwrite a symlink. The downside of this is that it requires a lot more care in managing symlinks, and is platform-specific.

我怀疑通过巧妙地使用.libPaths()可以将解决方案#1修改为类似于#2,但这似乎需要调用update.packages()而不是编写一个新的更新程序查找过时的程序包,将其安装到临时库中,然后更新库路径.这样做的好处是可以将现有进程限制为它启动时的.libPaths()(即,更改R知道的库路径可能不会传播到已经运行的那些实例,而无需在该实例内进行任何明确干预的情况下) ).

I suspect that solution #1 could be modified to be like #2, by clever use of .libPaths(), but this seems like one needs to not call update.packages() and instead write a new updater that finds the outdated packages, installs them to a temporary library, and then updates the library paths. The upside of this is that one could constrain an existing process to the .libPaths() it had when it started (i.e. changing the library paths R knows about might not be propagated to those instances that are already running, without some explicit intervention within that instance).

更新1.在示例场景中,两个竞争的R实例位于同一台计算机上.这不是必需的:据我了解的更新,如果两者共享相同的库,即共享驱动器上的相同目录,那么即使R的另一个实例在另一台计算机上,更新仍可能导致问题. .因此,一个人可能会意外杀死R进程,甚至看不到它.

Update 1. In the example scenario, the two competing R instances are on the same machine. This is not a requirement: as far as I understand the updates, if the two share the same libraries, i.e. the same directories on a shared drive, then the update can still cause problems, even if the other instance of R is on another machine. So, one could accidentally kill an R process and not even see it.

推荐答案

我的强烈猜测是无法解决此问题.

My strong guess is that there's no way around this.

尤其是当程序包包含已编译的代码时,您将无法在使用DLL的同时删除并替换该DLL,并且期望它仍然可以正常工作. R调用这些函数所使用的DLL的所有指针都将询问特定的内存位置,并发现莫名其妙地消失了. (请注意-虽然我在这里使用术语"DLL",但我的意思是非Windows特定意义上的含义,因为它用于例如?getLoadedDLLs的帮助文件中.共享库"可能更好通用术语.)

Especially when a package includes compiled code you can't remove and replace the DLL while it's in use and expect it to still work. All of the pointers into the DLL used by R calls to those functions will ask for a particular memory location and find it inexplicably gone. (Note -- while I use the term "DLL" here, I mean it in a non-Windows-specific sense, as it is used, e.g, in the help file for ?getLoadedDLLs. "Shared library" is perhaps the better generic term.)

(对我的怀疑的某些确认来自 R Windows常见问题解答,该报告报告"Windows在加载[a]程序包时锁定了它的DLL",这可能会导致update.packages()失败.)

(Some confirmation of my suspicions comes from the R for Windows FAQ, which reports that 'Windows locks [a] package's DLL while it is loaded' which can cause update.packages() to fail.)

我不确定R的延迟加载机制是如何实现的,但是想像一下,如果删除希望在机器的特定地址找到的对象,它也会被弄乱.

I'm not sure exactly how R's lazy-load mechanism is implemented, but imagine that it too could be messed with by removal of objects that it expects to find at a particular addresses in the machine.

知道计算机内部知识的其他人肯定会给出比这个更好的答案,但这是我的想法.

Someone else who knows more about the internals of computers will surely give a better answer than this, but those are my thoughts.

这篇关于更新R包的安全方法-是“热交换"或“热交换".可能的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆