使用 dplyr 按组计算行数 [英] Count number of rows by group using dplyr
问题描述
我正在使用 mtcars
数据集.我想找到特定数据组合的记录数.与 SQL 中的 count(*)
group by 子句非常相似.来自 plyr 的 ddply()
对我有用
I am using the mtcars
dataset. I want to find the number of records for a particular combination of data. Something very similar to the count(*)
group by clause in SQL. ddply()
from plyr is working for me
library(plyr)
ddply(mtcars, .(cyl,gear),nrow)
有输出
cyl gear V1
1 4 3 1
2 4 4 8
3 4 5 2
4 6 3 2
5 6 4 4
6 6 5 1
7 8 3 12
8 8 5 2
使用此代码
library(dplyr)
g <- group_by(mtcars, cyl, gear)
summarise(g, length(gear))
有输出
length(cyl)
1 32
我发现各种函数可以传递给 summarise()
但似乎没有一个对我有用.我发现的一个函数是 sum(G)
,它返回
I found various functions to pass in to summarise()
but none seem to work for me. One function I found is sum(G)
, which returned
Error in eval(expr, envir, enclos) : object 'G' not found
尝试使用 n()
,返回
Error in n() : This function should not be called directly
我做错了什么?我怎样才能让 group_by()
/summarise()
为我工作?
What am I doing wrong? How can I get group_by()
/ summarise()
to work for me?
推荐答案
dplyr 中有一个特殊的函数 n()
来计算行数(可能在组内):
There's a special function n()
in dplyr to count rows (potentially within groups):
library(dplyr)
mtcars %>%
group_by(cyl, gear) %>%
summarise(n = n())
#Source: local data frame [8 x 3]
#Groups: cyl [?]
#
# cyl gear n
# (dbl) (dbl) (int)
#1 4 3 1
#2 4 4 8
#3 4 5 2
#4 6 3 2
#5 6 4 4
#6 6 5 1
#7 8 3 12
#8 8 5 2
但是 dplyr 还提供了一个方便的 count
函数,它的功能完全相同,但输入次数更少:
But dplyr also offers a handy count
function which does exactly the same with less typing:
count(mtcars, cyl, gear) # or mtcars %>% count(cyl, gear)
#Source: local data frame [8 x 3]
#Groups: cyl [?]
#
# cyl gear n
# (dbl) (dbl) (int)
#1 4 3 1
#2 4 4 8
#3 4 5 2
#4 6 3 2
#5 6 4 4
#6 6 5 1
#7 8 3 12
#8 8 5 2
这篇关于使用 dplyr 按组计算行数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!