分组和有条件计数 [英] Group by and conditionally count

查看:217
本文介绍了分组和有条件计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我仍然在学习R中的数据管理。我知道我真的很接近,但不能得到精确的语法。我看过
通过使用条件R

条件计数并在R
分组,但不能完全翻译成我的工作。我试图得到一个计数的dist.km等于0的ST。最后,我想要添加列的各种距离范围的计数,但应该能够得到它。最终表应该具有所有状态和计数0。这是一个20行的示例。

I am still learning data management in R. I know I am really close, but can't get the precise syntax. I have looked at count a variable by using a condition in R and Conditional count and group by in R but can't quite translate to my work. I am trying to get a count of dist.km that equal 0 by ST. Eventually I will want to add columns with counts of various distance ranges, but should be able to get it after getting this. The final table should have all states and a count of 0s. Here is a 20 row sample.

structure(list(ST = structure(c(12L, 15L, 13L, 10L, 15L, 16L, 
11L, 12L, 8L, 14L, 10L, 14L, 6L, 11L, 5L, 5L, 15L, 1L, 6L, 4L
), .Label = c("CT", "DE", "FL", "GA", "MA", "MD", "ME", "NC", 
"NH", "NJ", "NY", "PA", "RI", "SC", "VA", "VT", "WV"), class = "factor"), 
Rfips = c(42107L, 51760L, 44001L, 34001L, 51061L, 50023L, 
36029L, 42101L, 37019L, 45079L, 34029L, 45055L, 24003L, 36027L, 
25009L, 25009L, 51760L, 9003L, 24027L, 1111L), zip = c(17972L, 
23226L, 2806L, 8330L, 20118L, 5681L, 14072L, 19115L, 28451L, 
29206L, 8741L, 29020L, 20776L, 12545L, 1922L, 1938L, 23226L, 
6089L, 21042L, 36278L), Year = c(2010L, 2005L, 2010L, 2008L, 
2007L, 2006L, 2005L, 2008L, 2009L, 2008L, 2010L, 2006L, 2007L, 
2008L, 2011L, 2011L, 2008L, 2005L, 2008L, 2009L), dist.km = c(0, 
42.4689368078209, 28.1123394088972, 36.8547005648639, 0, 
49.7276501081775, 0, 30.1937156926235, 0, 0, 31.5643658415831, 
0, 0, 0, 0, 0, 138.854136893762, 0, 79.4320981205195, 47.1692144550079
)), .Names = c("ST", "Rfips", "zip", "Year", "dist.km"), row.names = c(132931L, 
105670L, 123332L, 21361L, 51576L, 3520L, 47367L, 99962L, 18289L, 
126153L, 19321L, 83224L, 6041L, 46117L, 49294L, 48951L, 109350L, 
64465L, 80164L, 22687L), class = "data.frame")

这是我试过的几个代码块。

Here are a couple chunks of code I have tried.

state= DDcomplete %>%
group_by(ST) %>%
summarize(zero = sum(DDcomplete$dist.km==0, na.rm = TRUE))

state= aggregate(dist.km ~ ST, function(x) sum(dist.km==0, data=DDcomplete))

state = (DDcomplete[DDcomplete$dist.km==0,], .(ST), function(x) nrow(x))

感谢任何帮助!

推荐答案

如果要将其添加为您可以执行的栏:

If you want to add it as a column you can do:

DDcomplete %>% group_by(ST) %>% mutate(count = sum(dist.km == 0))

或者如果你只需要每个状态的计数:

Or if you just want the counts per state:

DDcomplete %>% group_by(ST) %>% summarise(count = sum(dist.km == 0))

其实,你非常接近解决方案。您的代码

Actually, you were very close to the solution. Your code

state= DDcomplete %>%
    group_by(ST) %>%
    summarize(zero = sum(DDcomplete$dist.km==0, na.rm = TRUE))

几乎是正确的。您可以从 sum 的调用中删除 DDcomplete $ ,因为在dplyr链中,您可以直接访问变量。

is almost correct. You can remove the DDcomplete$ from within the call to sum because within dplyr chains, you can access variables directly.

另请注意,通过使用 summarize ,您可以将每个组的数据框缩减为1行,列以及在 summarize 中计算的内容。

Also note that by using summarise, you will condense your data frame to 1 row per group with only the grouping column(s) and whatever you computed inside the summarise. If you just want to add a column with the counts, you can use mutate as I did in my answer.

如果您想要添加一个包含计数的列,您只对正数计数感兴趣,还可以使用dplyr的 count 函数和过滤器第一个数据子集:

If you're only interested in positive counts, you could also use dplyr's count function together with filter to first subset the data:

filter(DDcomplete, dist.km == 0) %>% count(ST)

这篇关于分组和有条件计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆