尝试以交互方式加载由暂停的批处理脚本保存的数据文件时出错 [英] Error when trying to interactively load data file saved by paused batch script
问题描述
In the process of debugging and solving my problem with retrieving attributes (Can I access R data objects' attributes without fully loading objects from file?), based on advice here on SO, I switched from using save()
and load()
to saveRDS()
and readRDS()
, correspondingly.
我的调查(通过非交互式调试打印)显示了以下内容:
My investigation (via non-interactive debug printing) showed the following:
-
在初始
saveRDS()
之后立即保存的对象包含所讨论的属性;
immediately after initial
saveRDS()
the saved object contains the attribute in question;
在脚本的首次运行后执行的交互式R会话,显示保存的对象中属性的缺少;
an interactive R session, performed after the initial run of the script, show the absence of the attribute from the saved object;
上面的先前发现解释了在脚本的下一次运行期间无法检索到所述属性的原因,我最初将其错误地归因于save/load
和saveRDS/readRDS
行为.
the previous findings above explain the failure to retrieve the said attribute during the next run of the script, which I initially incorrectly attributed to save/load
and saveRDS/readRDS
behavior.
为了在初始saveRDS
之后立即手动确认持久性对象(保存在.rds
文件中)中属性的存在,我决定暂停该批处理R脚本使用scan
在一个终端窗口中运行(readLine
在批处理R脚本中似乎不适用于该脚本):
In order to manually confirm the presence of the attribute in the persistent object (saved in an .rds
file) immediately after the initial saveRDS
, I decided to pause the batch R script running in one terminal window using scan
(readLine
doesn't appear to work for this in batch R scripts):
if (DEBUG) {
cat("Press [Enter] to continue")
key <- scan("stdin", character(), n=1)
}
,然后在另一个终端窗口中,通过交互式R会话检查保存的对象.
and, in another terminal window, to inspect the saved object via an interactive R session.
但是,在批处理脚本按预期停止后,何时在交互式会话中从.rds
文件加载保存的对象 失败,并显示以下消息:
However, when, after the batch script has stopped as expected, loading the saved object from the .rds
file in an interactive session failed with the following message:
> load("../cache/SourceForge/ZGV2TGlua3M=.rds")
Error: bad restore file magic number (file may be corrupted) -- no data loaded
In addition: Warning message:
file ‘ZGV2TGlua3M=.rds’ has magic number 'X'
Use of save versions prior to 2 is deprecated
以下输出描述了我在调查时的 R环境:
The following output describes my R environment at the time of investigation:
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
对我而言,唯一可能的解释是批处理会话(特别是通过scan
的暂停)以某种方式锁定或修改了环境,从而使得无法从内部正确访问R对象互动环节.可能存在这种情况的其他可能原因. 非常感谢您提供解决此问题的帮助或建议!
The only plausible to me explanation is that the batch session (and, specifically, the pause via scan
) somehow locks or modifies the environment that makes it impossible to properly access R objects from within the interactive session. Perhaps there exist other possible reasons for this situation. I would greatly appreciate any help or advice to solve this problem!
更新:
在终止了批处理R脚本的进程(在scan
变得无响应之后)之后,我再次尝试手动加载.rds
文件,由于在批处理脚本中没有暂停,因此期望成功.但是,令我惊讶的是,我收到了完全相同的错误消息.这使我认为.rds
文件确实已损坏(可能是由于我反复按下Ctrl-C
来停止正在运行的批处理R脚本的做法-我将需要提供更多温和"的东西).在找出停止运行脚本的更好方法之后,我将尝试重现该场景并在此处报告.
After killing the batch R script's process (which after scan
became unresponsive), I again tried to manually load the .rds
file, expecting a success due to the absence of the pause in the batch script. However, to my surprise, I was greeted with the exact same error message. This makes me think that the .rds
file is really corrupted (potentially due to my practice of stopping a running batch R script by repeatedly pressing Ctrl-C
- I will need to come up with something more "gentle"). After figuring out a better way to stop a running script, I will try to reproduce the scenario and report here.
更新2:
从缓存目录中删除所有(可能已损坏的).rds
文件,并按照上述方案(在批处理R脚本暂停的情况下以交互方式加载R数据文件)之后,输出会显示完全相同的错误消息强>和以前一样.在这一点上,我真的需要一个建议来弄清楚发生了什么.
After removing all (potentially corrupted) .rds
files from the cache directory and following the scenario described above (loading R data file interactively with batch R script paused), the output presented exactly the same error message as before. At this point, I really need an advice to figure out what's going on.
UPADATE 3(保存对象):
UPADATE 3 (saving the object):
assign(dataName, srdaGetData())
data <- as.name(dataName)
# save hash of the request's SQL query as data object's attribute,
# so that we can detect when configuration contains modified query
attr(data, "SQL") <- base64(request)
# save current data frame to RDS file
saveRDS(data, rdataFile)
更新4(可重现的示例):
UPDATE 4 (reproducible example):
library(RCurl)
info <- "Important data"
request <- "SELECT info FROM topSecret"
dataName <- "sf.data.devLinks"
rdataFile <- "/tmp/testAttr.rds"
getData <- function() {
return (info)
}
requestDigest <- base64(request)
# check if the archive file has already been processed
message("\nProcessing request \"", request, "\" ...\n")
# read back the object with the attribute
if (file.exists(rdataFile)) {
# now check if request's SQL query hasn't been modified
data <- readRDS(rdataFile)
message("Retrieved object '", as.name(data), "', containing:\n")
message(toString(data))
requestAttrib <- attr(data, "SQL", exact = TRUE)
message("\nObject '", data, "' contains attribute:\n\"",
base64(requestAttrib), "\"\n")
if (identical(requestDigest, requestAttrib)) {
message("Processing skipped: RDS file is up-to-date.\n")
stop()
}
rm(data)
}
message("Saving results of request \"",
request, "\" as R data object ...\n")
assign(dataName, getData())
data <- as.name(dataName)
# save hash of the request's SQL query as data object's attribute,
# so that we can detect when configuration contains modified query
attr(data, "SQL") <- base64(request)
# save current data frame to RDS file
saveRDS(data, rdataFile)
我希望保存dataName
变量的值,但是代码会保存变量的名称.
I expect the value of dataName
variable to be saved, however the code saves the name of the variable.
推荐答案
如果使用saveRDS
保存内容,则等效的loading
函数为readRDS
/
如果将对象save
放入RData
文件,则应使用load
加载对象.
If you save something using saveRDS
, the equivalent loading
function is readRDS
/
If you save
an object into an RData
file, you should use load
to load the object.
readRDS
将允许您指定要加载的对象的名称.
readRDS
will allow you to specify the name of the object being loaded.
load
将objects
加载到.RData
文件中,并且它们将保留其保存名称.
load
loads the objects
in an .RData
file, and they will retain the names with which they were saved.
如果"../cache/SourceForge/ZGV2TGlua3M=.rds"
是使用saveRDS
保存的,则
whatever <- readRDS("../cache/SourceForge/ZGV2TGlua3M=.rds")
会将对象加载为whatever
在未保存为.RData
格式的文件上运行load
会导致您发布错误消息.
Running load
on a file not saved in .RData
format will result in the error message you posted.
这篇关于尝试以交互方式加载由暂停的批处理脚本保存的数据文件时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!