当使用segfault错误汇总时,dplyr崩溃 [英] dplyr crashes when using summarise with segfault error

查看:194
本文介绍了当使用segfault错误汇总时,dplyr崩溃的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的 dplyr 脚本有时会在此代码段中崩溃:

my dplyr Script sometimes crashes in this code segment:

abc.fit <- abc_bySubject %>%
  do(fit = lm(value ~ delta, .)) %>%
  summarise(fvc_intercept = coef(fit)[1],
        fvc_slope = coef(fit)[2])

崩溃错误是:

 *** caught segfault ***
address 0x7ff041000098, cause 'memory not mapped'

但是,当我在Rstudio中执行此部分时出现错误致命错误 - R会话中止,但不太频繁。当我在R命令行中输入脚本时总是发生。
我在不同的机器上测试了很多RAM。
R和所有软件包都是uptodate,我正在使用最新版本的Ubuntu。

However, it also occur when I execute this part in Rstudio with error fatal error - R Session Aborted , but less frequently. It always happens when I source the Script in the R command line. I tested it on different machines with lots of RAM. R and all packages are uptodate and I'm using the latest version of Ubuntu.

它可能与这个问题有关:链接,但它说这是固定的。

It may be related to this question: link but it says this is fixed.

也许有一个更好的解决方案

Perhaps there is a nicer solution

推荐答案

另一个选项不使用总结 (OP的代码工作在 dplyr_0.4.1.9000 )以获得预期的输出将从 coef code> lm ,将其转换为列表,更改列表元素的名称( setNames )并在 do 环境中转换回 data.frame

Another option without using summarise (the OP's code works in dplyr_0.4.1.9000) to get the expected output would be extracting the coef from lm, convert it to list, change the 'names' of list elements (setNames) and convert back to data.frame within the do environment.

library(dplyr)
abc.fit <- abc_bySubject %>%
                do(data.frame(setNames(as.list(coef(lm(value~delta, data=.))),
                            c('fvc_intercept','fvc_slope' ))))

abc.fit
#    Subject fvc_intercept   fvc_slope
#1       1     0.5319503 -0.03147698
#2       2     0.4478791  0.04293860
#3       3     0.4318059 -0.03276570

如果我们需要删除主题列,我们可以 ungroup() code>并使用选择选择主题以外的列

If we need to delete the 'Subject' column, we can ungroup() and use select to select columns other than 'Subject'

abc.fit %>% 
      ungroup() %>%
      select(-Subject)
#  fvc_intercept   fvc_slope
#1     0.5319503 -0.03147698
#2     0.4478791  0.04293860
#3     0.4318059 -0.03276570






另一个选项是 data.table 。我们将data.frame转换为data.table( setDT(abc)),按主题列分组,我们得到系数((code $ c),,转换为列表 as.list )并设置列的名称( setnames )。


Another option would be data.table. We convert the 'data.frame' to 'data.table' (setDT(abc)), grouped by the 'Subject' column, we get the coefficients (coef) of lm, convert to list (as.list) and set the names of the columns (setnames).

 library(data.table)
 res <- setnames(setDT(abc)[, as.list(coef(lm(value~delta))),
               by =Subject],2:3, c('fvc_intercept', 'fvc_slope'))[]
 res
 #   Subject fvc_intercept   fvc_slope
 #1:       1     0.5319503 -0.03147698
 #2:       2     0.4478791  0.04293860
 #3:       3     0.4318059 -0.03276570

res的兴趣列表

res[,-1, with=FALSE]
#   fvc_intercept   fvc_slope
#1:     0.5319503 -0.03147698
#2:     0.4478791  0.04293860
#3:     0.4318059 -0.03276570



资料



data

set.seed(24)
abc <- data.frame(Subject= rep(1:3,each=10), delta=rnorm(30), value=runif(30))
abc_bySubject <- group_by(abc, Subject)

这篇关于当使用segfault错误汇总时,dplyr崩溃的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆