分组并有条件地计数 [英] Group by and conditionally count

查看:29
本文介绍了分组并有条件地计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我仍在学习 R 中的数据管理.我知道我非常接近,但无法获得准确的语法.我看过使用 R 中的条件计算变量R 中的条件计数和分组依据但不能完全转化为我的工作.我试图通过 ST 获得等于 0 的 dist.km 计数.最终我会想要添加具有各种距离范围计数的列,但是在获得它之后应该能够获得它.决赛桌应包含所有状态和 0 计数.这是一个 20 行的示例.

I am still learning data management in R. I know I am really close, but can't get the precise syntax. I have looked at count a variable by using a condition in R and Conditional count and group by in R but can't quite translate to my work. I am trying to get a count of dist.km that equal 0 by ST. Eventually I will want to add columns with counts of various distance ranges, but should be able to get it after getting this. The final table should have all states and a count of 0s. Here is a 20 row sample.

structure(list(ST = structure(c(12L, 15L, 13L, 10L, 15L, 16L, 
11L, 12L, 8L, 14L, 10L, 14L, 6L, 11L, 5L, 5L, 15L, 1L, 6L, 4L
), .Label = c("CT", "DE", "FL", "GA", "MA", "MD", "ME", "NC", 
"NH", "NJ", "NY", "PA", "RI", "SC", "VA", "VT", "WV"), class = "factor"), 
Rfips = c(42107L, 51760L, 44001L, 34001L, 51061L, 50023L, 
36029L, 42101L, 37019L, 45079L, 34029L, 45055L, 24003L, 36027L, 
25009L, 25009L, 51760L, 9003L, 24027L, 1111L), zip = c(17972L, 
23226L, 2806L, 8330L, 20118L, 5681L, 14072L, 19115L, 28451L, 
29206L, 8741L, 29020L, 20776L, 12545L, 1922L, 1938L, 23226L, 
6089L, 21042L, 36278L), Year = c(2010L, 2005L, 2010L, 2008L, 
2007L, 2006L, 2005L, 2008L, 2009L, 2008L, 2010L, 2006L, 2007L, 
2008L, 2011L, 2011L, 2008L, 2005L, 2008L, 2009L), dist.km = c(0, 
42.4689368078209, 28.1123394088972, 36.8547005648639, 0, 
49.7276501081775, 0, 30.1937156926235, 0, 0, 31.5643658415831, 
0, 0, 0, 0, 0, 138.854136893762, 0, 79.4320981205195, 47.1692144550079
)), .Names = c("ST", "Rfips", "zip", "Year", "dist.km"), row.names = c(132931L, 
105670L, 123332L, 21361L, 51576L, 3520L, 47367L, 99962L, 18289L, 
126153L, 19321L, 83224L, 6041L, 46117L, 49294L, 48951L, 109350L, 
64465L, 80164L, 22687L), class = "data.frame")

这是我尝试过的几段代码.

Here are a couple chunks of code I have tried.

state= DDcomplete %>%
group_by(ST) %>%
summarize(zero = sum(DDcomplete$dist.km==0, na.rm = TRUE))

state= aggregate(dist.km ~ ST, function(x) sum(dist.km==0, data=DDcomplete))

state = (DDcomplete[DDcomplete$dist.km==0,], .(ST), function(x) nrow(x))

推荐答案

如果您想将其添加为列,您可以这样做:

If you want to add it as a column you can do:

DDcomplete %>% group_by(ST) %>% mutate(count = sum(dist.km == 0))

或者,如果您只想要每个州的计数:

Or if you just want the counts per state:

DDcomplete %>% group_by(ST) %>% summarise(count = sum(dist.km == 0))

实际上,您已经非常接近解决方案了.您的代码

Actually, you were very close to the solution. Your code

state= DDcomplete %>%
    group_by(ST) %>%
    summarize(zero = sum(DDcomplete$dist.km==0, na.rm = TRUE))

几乎是正确的.您可以从对 sum 的调用中删除 DDcomplete$,因为在 dplyr 链中,您可以直接访问变量.

is almost correct. You can remove the DDcomplete$ from within the call to sum because within dplyr chains, you can access variables directly.

另请注意,通过使用 summarise,您会将数据框压缩为每组 1 行,仅包含分组列以及您在 summarise 中计算的任何内容.如果您只想添加包含计数的列,您可以像我在回答中所做的那样使用 mutate.

Also note that by using summarise, you will condense your data frame to 1 row per group with only the grouping column(s) and whatever you computed inside the summarise. If you just want to add a column with the counts, you can use mutate as I did in my answer.

如果您只对 positive 计数感兴趣,您还可以使用 dplyr 的 count 函数和 filter 来首先对数据进行子集化:

If you're only interested in positive counts, you could also use dplyr's count function together with filter to first subset the data:

filter(DDcomplete, dist.km == 0) %>% count(ST)

这篇关于分组并有条件地计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆