R通过定义分组来聚合数据 [英] R aggregate data by defining grouping

查看:239
本文介绍了R通过定义分组来聚合数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

 类别频率
1 C1 9 $我无法对R中的以下数据进行分组和求和: b $ b 2 C2 39
3 C3 3
4 A1 38
5 A2 2
6 A3 29
7 B1 377
8 B2 214
9 B3 790
10 B4 724
11 D1 551
12 D2 985
13 E5 19
14 E4 28

看起来像这样:

  category freq 
1 A 69
2 B 2105
3 C 51
4 D 1536
5 E 47

我通常使用ddply通过属性聚合数据,但这只是在给定列中添加具有相同属性的所有值行。我需要能够指定应该集中在一个类别中的多个属性。



提前感谢任何帮助。

解决方案

为什么不在数据框中添加列,这将是您的类别列的字母部分。然后,您可以使用 ddply



示例:

  df = data.frame(id = c(1,2,3,4,5),category = c(AB1,AB2,B1,B2 ,B3),freq = c(50,51,2,26))
df $ new = as.factor(gsub(\\d,,df $ category))

然后您可以使用 ddply 新列,如下所示:

  library(plyr)
aggregate< - ddply(df,。(new) ,总结,频率=和(频率))

您得到以下结果:

 #new freq 
#1 AB 101
#2 B 31
/ pre>

只有当您打算将所有类别与相同伞类别下的类似字母子字符串进行分组时,这才有效。 p>

如果您希望将自定义类别分组在一个类别下(您的示例:KG,XM和L4将属于同一类别),你可以定义新的su每个类别,并将每个子类别分配给适当的超级类别。我可以想到的一种方法是开关功能。请参阅以下示例:

  df = data.frame(id = c(1,2,3,4,5), category = c(A,B,KG,XM,L4),freq = c(50,51,3,2,26))

fct< ; - function(cat){switch(cat,A=CAT1,B=CAT2,KG=CAT3,XM=CAT3,L4=CAT3 )
df $ new = as.factor(unlist(lapply(df $ category,fct)))

聚合< - ddply(df,。(new),总结, = sum(freq))

这将给你:

 #new freq 
#1 CAT1 50
#2 CAT2 51
#3 CAT3 31


I am having trouble grouping and summing the follwing data in R:

category freq
1    C1     9
2    C2    39
3    C3     3
4    A1    38
5    A2     2
6    A3    29
7    B1   377
8    B2   214
9    B3   790
10   B4   724
11   D1   551
12   D2   985
13   E5    19
14   E4    28

to look like this:

category freq
1    A    69
2    B    2105
3    C    51
4    D    1536
5    E    47

I usually use ddply to aggregate data by an attribute but this just adds all values rows with the same attribute in a given column. I need to be able to specify multiple attributes that should be lumped into one category.

Thanks in advance for any help.

解决方案

Why don't you add a column to your dataframe, that would be the letter part of your "Category" column. Then, you could use ddply.

Example:

 df = data.frame(id = c(1,2,3,4,5), category = c("AB1", "AB2", "B1", "B2", "B3"), freq = c(50,51,2,26))
 df$new = as.factor(gsub("\\d", "", df$category))

You could then use ddply based on the new column, as follows:

 library(plyr)
 aggregate <- ddply(df, .(new), summarize, freq = sum(freq))

You get the following result:

#  new freq
#1  AB  101
#2   B   31

This would work only if you intend to group all the categories with similar "alphabetical" substring under the same umbrella category.

If, HOWEVER, you wish to group custom categories under one category, (your example: KG, XM and L4 would be part of the same category), you could define new "super" categories, and assign each sub-category to the appropriate "super" category. One way that I can think of is the switch function. Please see example below:

 df = data.frame(id = c(1,2,3,4,5), category = c("A", "B", "KG", "XM", "L4"), freq = c(50,51,3,2,26))

 fct <- function(cat) {switch(cat, "A" = "CAT1", "B" = "CAT2", "KG" = "CAT3", "XM" = "CAT3", "L4"="CAT3")}
 df$new = as.factor(unlist(lapply(df$category, fct)))

 aggregate <- ddply(df, .(new), summarize, freq = sum(freq))

This will give you:

 #   new freq
 #1 CAT1   50
 #2 CAT2   51
 #3 CAT3   31

这篇关于R通过定义分组来聚合数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆