R按组计数元素的出现次数 [英] R count occurrences of an element by groups
问题描述
计算每个组中矢量或data.frame上元素出现的最简单方法是什么?
我的意思不是仅仅计算总数(就像其他stackoverflow问题所问的那样),而是给每个成功的事件都赋予不同的数字.
What is the easiest way to count the occurrences of a an element on a vector or data.frame at every grouop?
I don't mean just counting the total (as other stackoverflow questions ask) but giving a different number to every succesive occurence.
例如以下简单数据框:(但我将使用具有更多列的数据框)
for example for this simple dataframe: (but I will work with dataframes with more columns)
mydata <- data.frame(A=c("A","A","A","B","B","A", "A"))
我找到了这个解决方案:
I've found this solution:
cbind(mydata,myorder=ave(rep(1,nrow(mydata)),mydata$A, FUN=cumsum))
,这里是结果:
A myorder
A 1
A 2
A 3
B 1
B 2
A 4
A 5
没有任何命令可以执行此操作吗?还是使用专门的软件包?
Isn't there any single command to do it?. Or using an specialized package?
我希望它以后使用tidyr的spread()函数.
I want it to later use tidyr's spread() function.
我的问题与是否有一个汇总的FUN选项来计算发生次数?因为我不想知道最后出现的总数,而是直到每个元素的累积出现次数.
My question is not the same than Is there an aggregate FUN option to count occurrences? because I don't want to know the total number of occurrencies at the end but the cumulative occurencies till every element.
好的,我的问题有点复杂
OK, my problem is a little bit more complex
mydata <- data.frame(group=c("x","x","x","x","y","y", "y"), letter=c("A","A","A","B","B","A", "A"))
我只知道解决上面写的第一个示例.但是,当我还希望通过第二个分组变量来处理该怎么办?像按组出现(字母)这样的东西.
I only know to solve the first example I wrote above. But what happens when I want it also by a second grouping variable? something like occurrencies(letter) by group.
group letter "occurencies within group"
x A 1
x A 2
x A 3
x B 1
y B 1
y A 1
y A 2
我已经找到了
ave(rep(1,nrow(mydata)),list(mydata $ group,mydata $ letter),FUN = cumsum)
虽然应该会容易一些.
ave(rep(1,nrow(mydata)),list(mydata$group, mydata$letter), FUN=cumsum)
though it shoould be something easier.
推荐答案
使用 data.table
library(data.table)
setDT(mydata)
mydata[, myorder := 1:.N, by = .(group, letter)]
by
参数使该表在名为 A
的列的组内处理. .N
是该组中的行数(如果 by
参数为空,则为表中的行数),因此对于每个子表,每行的索引从1到该子表中的行数.
The by
argument makes the table be dealt with within the groups of the column called A
. .N
is the number of rows within that group (if the by
argument was empty it would be the number of rows in the table), so for each sub-table, each row is indexed from 1 to the number of rows in that sub-table.
mydata
group letter myorder
1: x A 1
2: x A 2
3: x A 3
4: x B 1
5: y B 1
6: y A 1
7: y A 2
或几乎相同的 dplyr
解决方案
mydata %>%
group_by(group, letter) %>%
mutate(myorder = 1:n())
这篇关于R按组计数元素的出现次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!