R通过定义分组来聚合数据 [英] R aggregate data by defining grouping
问题描述
类别频率
1 C1 9 $我无法对R中的以下数据进行分组和求和: b $ b 2 C2 39
3 C3 3
4 A1 38
5 A2 2
6 A3 29
7 B1 377
8 B2 214
9 B3 790
10 B4 724
11 D1 551
12 D2 985
13 E5 19
14 E4 28
看起来像这样:
category freq
1 A 69
2 B 2105
3 C 51
4 D 1536
5 E 47
我通常使用ddply通过属性聚合数据,但这只是在给定列中添加具有相同属性的所有值行。我需要能够指定应该集中在一个类别中的多个属性。
提前感谢任何帮助。
为什么不在数据框中添加列,这将是您的类别列的字母部分。然后,您可以使用 ddply
。
示例:
df = data.frame(id = c(1,2,3,4,5),category = c(AB1,AB2,B1,B2 ,B3),freq = c(50,51,2,26))
df $ new = as.factor(gsub(\\d,,df $ category))
然后您可以使用 ddply
新列,如下所示:
library(plyr)
aggregate< - ddply(df,。(new) ,总结,频率=和(频率))
您得到以下结果:
#new freq
/ pre>
#1 AB 101
#2 B 31
只有当您打算将所有类别与相同伞类别下的类似字母子字符串进行分组时,这才有效。 p>
如果您希望将自定义类别分组在一个类别下(您的示例:KG,XM和L4将属于同一类别),你可以定义新的su每个类别,并将每个子类别分配给适当的超级类别。我可以想到的一种方法是
开关
功能。请参阅以下示例:df = data.frame(id = c(1,2,3,4,5), category = c(A,B,KG,XM,L4),freq = c(50,51,3,2,26))
fct< ; - function(cat){switch(cat,A=CAT1,B=CAT2,KG=CAT3,XM=CAT3,L4=CAT3 )
df $ new = as.factor(unlist(lapply(df $ category,fct)))
聚合< - ddply(df,。(new),总结, = sum(freq))
这将给你:
#new freq
#1 CAT1 50
#2 CAT2 51
#3 CAT3 31
I am having trouble grouping and summing the follwing data in R:
category freq 1 C1 9 2 C2 39 3 C3 3 4 A1 38 5 A2 2 6 A3 29 7 B1 377 8 B2 214 9 B3 790 10 B4 724 11 D1 551 12 D2 985 13 E5 19 14 E4 28
to look like this:
category freq 1 A 69 2 B 2105 3 C 51 4 D 1536 5 E 47
I usually use ddply to aggregate data by an attribute but this just adds all values rows with the same attribute in a given column. I need to be able to specify multiple attributes that should be lumped into one category.
Thanks in advance for any help.
解决方案Why don't you add a column to your dataframe, that would be the letter part of your "Category" column. Then, you could use
ddply
.Example:
df = data.frame(id = c(1,2,3,4,5), category = c("AB1", "AB2", "B1", "B2", "B3"), freq = c(50,51,2,26)) df$new = as.factor(gsub("\\d", "", df$category))
You could then use
ddply
based on the new column, as follows:library(plyr) aggregate <- ddply(df, .(new), summarize, freq = sum(freq))
You get the following result:
# new freq #1 AB 101 #2 B 31
This would work only if you intend to group all the categories with similar "alphabetical" substring under the same umbrella category.
If, HOWEVER, you wish to group custom categories under one category, (your example: KG, XM and L4 would be part of the same category), you could define new "super" categories, and assign each sub-category to the appropriate "super" category. One way that I can think of is the
switch
function. Please see example below:df = data.frame(id = c(1,2,3,4,5), category = c("A", "B", "KG", "XM", "L4"), freq = c(50,51,3,2,26)) fct <- function(cat) {switch(cat, "A" = "CAT1", "B" = "CAT2", "KG" = "CAT3", "XM" = "CAT3", "L4"="CAT3")} df$new = as.factor(unlist(lapply(df$category, fct))) aggregate <- ddply(df, .(new), summarize, freq = sum(freq))
This will give you:
# new freq #1 CAT1 50 #2 CAT2 51 #3 CAT3 31
这篇关于R通过定义分组来聚合数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!