R按组计数元素的出现次数 [英] R count occurrences of an element by groups

查看:55
本文介绍了R按组计数元素的出现次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

计算每个组中矢量或data.frame上元素出现的最简单方法是什么?
我的意思不是仅仅计算总数(就像其他stackoverflow问题所问的那样),而是给每个成功的事件都赋予不同的数字.

What is the easiest way to count the occurrences of a an element on a vector or data.frame at every grouop?
I don't mean just counting the total (as other stackoverflow questions ask) but giving a different number to every succesive occurence.

例如以下简单数据框:(但我将使用具有更多列的数据框)

for example for this simple dataframe: (but I will work with dataframes with more columns)

mydata <- data.frame(A=c("A","A","A","B","B","A", "A"))

我找到了这个解决方案:

I've found this solution:

cbind(mydata,myorder=ave(rep(1,nrow(mydata)),mydata$A, FUN=cumsum))   

,这里是结果:

 A myorder  
 A       1  
 A       2  
 A       3  
 B       1  
 B       2  
 A       4  
 A       5  

没有任何命令可以执行此操作吗?还是使用专门的软件包?

Isn't there any single command to do it?. Or using an specialized package?

我希望它以后使用tidyr的spread()函数.

I want it to later use tidyr's spread() function.

我的问题与是否有一个汇总的FUN选项来计算发生次数?因为我不想知道最后出现的总数,而是直到每个元素的累积出现次数.

My question is not the same than Is there an aggregate FUN option to count occurrences? because I don't want to know the total number of occurrencies at the end but the cumulative occurencies till every element.

好的,我的问题有点复杂

OK, my problem is a little bit more complex

mydata <- data.frame(group=c("x","x","x","x","y","y", "y"), letter=c("A","A","A","B","B","A", "A"))

我只知道解决上面写的第一个示例.但是,当我还希望通过第二个分组变量来处理该怎么办?像按组出现(字母)这样的东西.

I only know to solve the first example I wrote above. But what happens when I want it also by a second grouping variable? something like occurrencies(letter) by group.

group letter  "occurencies within group"  
 x      A       1  
 x      A       2  
 x      A       3  
 x      B       1  
 y      B       1  
 y      A       1  
 y      A       2  

我已经找到了

ave(rep(1,nrow(mydata)),list(mydata $ group,mydata $ letter),FUN = cumsum)
虽然应该会容易一些.

ave(rep(1,nrow(mydata)),list(mydata$group, mydata$letter), FUN=cumsum)
though it shoould be something easier.

推荐答案

使用 data.table

library(data.table)
setDT(mydata)
mydata[, myorder := 1:.N, by = .(group, letter)]

by 参数使该表在名为 A 的列的组内处理. .N 是该组中的行数(如果 by 参数为空,则为表中的行数),因此对于每个子表,每行的索引从1到该子表中的行数.

The by argument makes the table be dealt with within the groups of the column called A. .N is the number of rows within that group (if the by argument was empty it would be the number of rows in the table), so for each sub-table, each row is indexed from 1 to the number of rows in that sub-table.

mydata
   group letter myorder
1:     x      A       1
2:     x      A       2
3:     x      A       3
4:     x      B       1
5:     y      B       1
6:     y      A       1
7:     y      A       2

或几乎相同的 dplyr 解决方案

mydata %>% 
  group_by(group, letter) %>% 
  mutate(myorder = 1:n())

这篇关于R按组计数元素的出现次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆