在dplyr中总结值 - 崩溃RStudio [英] Summarising values in dplyr - Crashes RStudio

查看:494
本文介绍了在dplyr中总结值 - 崩溃RStudio的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

可以 dplyr 执行链接总结 data.frame



我的data.frame具有以下结构:

  data_df = tbl_df(data)
data_df%。%
group_by(col_1)%。%
summaryize(number_of = length(col_2))%。%
总结(sum_of = sum(col_3))

这导致RStudio遇到一个致命错误 - R会话已中止消息



通常使用 plyr 我将包括这些总结函数没有问题。



更新



数据是。 %。%= serial ggplot()+ = parallel 。这是一个非程序员对事物的理解,R大师可能会来,告诉我我很愚蠢,但现在这是你最好的理论。


Can dplyr perform chained summarise operations on a data.frame?

My data.frame has the structure:

data_df = tbl_df(data)    
data_df %.%
        group_by(col_1) %.%
        summarise(number_of= length(col_2)) %.%
        summarise(sum_of = sum(col_3)) 

This causes RStudio to encounter a fatal error - R Session Aborted message

Usually with plyr I would include these summarise functions without problems.

UPDATE

Data are here.

Code is:

library(dplyr)

orth <- read.csv('orth0106.csv')
orth_df = tbl_df(orth)


orth_df %.%
    group_by(Hospital) %.%
    summarise(Procs = length(Procedure)) %.%
    summarise(SSIs = sum(SSI))

解决方案

I can reproduce the error on Windows 7 machine running RStudio 0.97.551

It may be because you're calling summarise and chaining onto something that's not there. You can summarise with 2 different columns as I've done here.

url <- "https://raw.github.com/johnmarquess/some.data/master/orth0106.csv"

library(dplyr)

orth <- read.csv(url)
orth_df <- tbl_df(orth)


orth_df %.%
    group_by(Hospital) %.%
    summarise(Procs = length(Procedure), SSIs = sum(SSI))

## Source: local data frame [18 x 3]
## 
##    Hospital Procs SSIs
## 1         A   865   80
## 2         B  1069   38
## 3         C   796   24
## 4         D   891   35
## 5         E   997   39
## 6         F   550   30
## 7         G  2598  128
## 8         H   373   27
## 9         I  1079   70
## 10        J   714   30
## 11        K   477   30
## 12        L   227    2
## 13        M   125    6
## 14        N   589   38
## 15        O   292    3
## 16        P   149    9
## 17        Q  1984   52
## 18        R   351   13

In any event this seems like either an RStudio or a dplyr bug. I'd open up an issue with Hadley as he probably cares either way. https://github.com/hadley/dplyr/issues

EDIT This (your first call) also cause rgui (windows) and the terminal to crash as well on:

R version 3.0.2 (2013-09-25)
Platform: i386-w64-mingw32/i386 (32-bit)

This indicates a dplyr problem Hadley and Romain will want to know about.

To get my first point we run:

orth_df %.%
    group_by(Hospital) %.%
    summarise(Procs = length(Procedure))

Source: local data frame [18 x 2]

   Hospital Procs
1         A   865
2         B  1069
3         C   796
4         D   891
5         E   997
6         F   550
7         G  2598
8         H   373
9         I  1079
10        J   714
11        K   477
12        L   227
13        M   125
14        N   589
15        O   292
16        P   149
17        Q  1984
18        R   351

Where is %.% summarise(SSIs = sum(SSI)) supposed to find SSI?

So the chaining you think is happening fails. TO my understanding %.% isn't exactly like how ggplot2 works but similar. In ggplot2 once you pass the data in the initial mapping you can access it later on. Here %.% seems to modify grab the left chunk and operate on it like this:

So you're grabbing:

   Hospital Procs
1         A   865
2         B  1069
3         C   796
.
.
.
17        Q  1984
18        R   351

when you use %.% summarise(SSIs = sum(SSI)) and there is no SSI to be gotten. So the analogy that comes to mind is serial vs. parallel wiring Christmas lights. %.% = serial ggplot() + = parallel. This is a nonprogrammer's understanding of things and the R gurus may come and tell me I'm stupid but for now that's the best theory you've got.

这篇关于在dplyr中总结值 - 崩溃RStudio的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆