当使用segfault错误汇总时,dplyr崩溃 [英] dplyr crashes when using summarise with segfault error
问题描述
我的 dplyr
脚本有时会在此代码段中崩溃:
my dplyr
Script sometimes crashes in this code segment:
abc.fit <- abc_bySubject %>%
do(fit = lm(value ~ delta, .)) %>%
summarise(fvc_intercept = coef(fit)[1],
fvc_slope = coef(fit)[2])
崩溃错误是:
*** caught segfault ***
address 0x7ff041000098, cause 'memory not mapped'
但是,当我在Rstudio中执行此部分时出现错误致命错误 - R会话中止
,但不太频繁。当我在R命令行中输入脚本时总是发生。
我在不同的机器上测试了很多RAM。
R和所有软件包都是uptodate,我正在使用最新版本的Ubuntu。
However, it also occur when I execute this part in Rstudio with error fatal error - R Session Aborted
, but less frequently. It always happens when I source the Script in the R command line.
I tested it on different machines with lots of RAM.
R and all packages are uptodate and I'm using the latest version of Ubuntu.
它可能与这个问题有关:链接,但它说这是固定的。
It may be related to this question: link but it says this is fixed.
也许有一个更好的解决方案
Perhaps there is a nicer solution
推荐答案
另一个选项不使用总结
(OP的代码工作在 dplyr_0.4.1.9000
)以获得预期的输出将从 coef
code> lm ,将其转换为列表
,更改列表元素的名称( setNames
)并在
do
环境中转换回 data.frame
。
Another option without using summarise
(the OP's code works in dplyr_0.4.1.9000
) to get the expected output would be extracting the coef
from lm
, convert it to list
, change the 'names' of list elements (setNames
) and convert back to data.frame
within the do
environment.
library(dplyr)
abc.fit <- abc_bySubject %>%
do(data.frame(setNames(as.list(coef(lm(value~delta, data=.))),
c('fvc_intercept','fvc_slope' ))))
abc.fit
# Subject fvc_intercept fvc_slope
#1 1 0.5319503 -0.03147698
#2 2 0.4478791 0.04293860
#3 3 0.4318059 -0.03276570
如果我们需要删除主题列,我们可以 ungroup() code>并使用
选择
选择主题以外的列
If we need to delete the 'Subject' column, we can ungroup()
and use select
to select columns other than 'Subject'
abc.fit %>%
ungroup() %>%
select(-Subject)
# fvc_intercept fvc_slope
#1 0.5319503 -0.03147698
#2 0.4478791 0.04293860
#3 0.4318059 -0.03276570
另一个选项是 data.table
。我们将data.frame转换为data.table( setDT(abc)
),按主题列分组,我们得到系数((code $ c),
,转换为
列表
( as.list
)并设置列的名称( setnames
)。
Another option would be data.table
. We convert the 'data.frame' to 'data.table' (setDT(abc)
), grouped by the 'Subject' column, we get the coefficients (coef
) of lm
, convert to list
(as.list
) and set the names of the columns (setnames
).
library(data.table)
res <- setnames(setDT(abc)[, as.list(coef(lm(value~delta))),
by =Subject],2:3, c('fvc_intercept', 'fvc_slope'))[]
res
# Subject fvc_intercept fvc_slope
#1: 1 0.5319503 -0.03147698
#2: 2 0.4478791 0.04293860
#3: 3 0.4318059 -0.03276570
res的兴趣列表
res[,-1, with=FALSE]
# fvc_intercept fvc_slope
#1: 0.5319503 -0.03147698
#2: 0.4478791 0.04293860
#3: 0.4318059 -0.03276570
资料
data
set.seed(24)
abc <- data.frame(Subject= rep(1:3,each=10), delta=rnorm(30), value=runif(30))
abc_bySubject <- group_by(abc, Subject)
这篇关于当使用segfault错误汇总时,dplyr崩溃的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!