R:如何使用正则表达式对列表元素进行分组和汇总? [英] R: How to group and aggregate list elements using regex?

查看:125
本文介绍了R:如何使用正则表达式对列表元素进行分组和汇总?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想按组汇总(汇总)以下产品列表(见下文):

I want to aggregate (sum up) the following product list by groups (see below):

prods <- list("101.2000"=data.frame(1,2,3),
              "102.2000"=data.frame(4,5,6),
              "103.2000"=data.frame(7,8,9),
              "104.2000"=data.frame(1,2,3),
              "105.2000"=data.frame(4,5,6),
              "106.2000"=data.frame(7,8,9),
              "101.2001"=data.frame(1,2,3),
              "102.2001"=data.frame(4,5,6),
              "103.2001"=data.frame(7,8,9),
              "104.2001"=data.frame(1,2,3),
              "105.2001"=data.frame(4,5,6),
              "106.2001"=data.frame(7,8,9))
test= list("100.2000"=data.frame(2,3,5),
           "100.2001"=data.frame(4,5,6))
names <- c("A", "B", "C")
prods <- lapply(prods, function (x) {colnames(x) <- names; return(x)})

产品列表(产品)的每个元素都有名称组合产品编号和年份(例如101.2000-> 101 = prod nr。和2000 =年)。并且组仅包含用于汇总的产品编号。

Each element of the product list (prods) has a name combination of the product number and the year (e.g. 101.2000 --> 101 = prod nr. and 2000 = year). And the groups only contain product numbers for the aggregation.

group1 <- c(101, 106)
group2 <- c(102, 104)
group3 <- c(105, 103)

我的预期结果显示了按年汇总的产品组:

My expected result, shows the aggregated product groups by year:

$group1.2000
  A  B  C
1 8 10 12

$group2.2000
  A B C
1 5 7 9

$group3.2000
   A  B  C
1 11 13 15

$group1.2001
  A  B  C
1 8 10 12

$group2.2001
  A B C
1 5 7 9

$group3.2001
   A  B  C
1 11 13 15

到目前为止,我是这样尝试的:首先,我将产品的名称分解为产品编号:

So far, I tried this way: First I decomposed the names of prods into product numbers:

prodnames <- names(prods)
prodnames_sub <- gsub("\\..*.","", prodnames)

然后我尝试使用lapply进行汇总:

And then I tried to aggregate using lapply:

lapply(prods, function(x) aggregate( ...  , FUN = sum)

但是,我没有找到实现先前产品编号的方法在聚合函数中。有想法吗?谢谢

However, I didn't find how to implement the previous product numbers in the aggregation function. Ideas? Thanks

推荐答案

这里有两种方法。

1)使用列表创建两列数据。frame S 来自其列为乘积(列)和关联组( ind 列)的组。创建列表以进行拆分。在产生 By 的代码中, sub( \\。*,,名称(棒))提取产品,然后使用 match 查找关联的组。 sub( \\ .. *,,names(prods))提取年份。接下来执行拆分并对其应用重叠以运行求和。 )的两个组成部分可以

1) Using lists Create a two column data.frame S from the groups whose columns are the products (value column) and associated groups (ind column). Create the list to split by, By. In code to produce By, sub("\\.*", "", names(prods)) extracts the products and match is then used to find the associated group. sub("\\..*", "", names(prods)) extracts the year. Next perform the split and lapply over it to run the summations. The two components of By (group and year) can be reversed to change the order of the output, if desired.

S <- stack(list(group1 = group1, group2 = group2, group3 = group3))
By <- list(group = S$ind[match(sub("\\..*", "", names(prods)), S$values)],
           year = sub(".*\\.", "", names(prods)))
lapply(split(prods, By), function(x) colSums(do.call(rbind, x)))

2)使用data.frames 将各组转换为一个数据框,然后将它们合并,进行聚合并拆分回列表。除订单外,输出与请求的相同。 (反转聚合公式中的两个右手变量以获取问题中显示的顺序,但这也会反转输出列表中每个组件名称的两个部分。)

2) Using data.frames Convert the groups and prods each to a data frame, merge them, perform an aggregate and split back into a list. The output is the same as requested except for order. (Reverse the two right hand variables in the aggregate formula to get the order shown in the question but that will also reverse the two parts of each component name in he output list.)

S <- stack(list(group1 = group1, group2 = group2, group3 = group3))

DF0 <- do.call(rbind, prods)
DF <- cbind(do.call(rbind, strsplit(rownames(DF0), ".", fixed = TRUE)), DF0)

M <- merge(DF, S, all.x = TRUE, by = 1)
Ag <- aggregate(cbind(A, B, C) ~ ind + `2`, M, sum)
lapply(split(Ag, paste(Ag[[1]], Ag[[2]], sep = ".")), "[", 3:5)

给予:

$group1.2000
  A  B  C
1 8 10 12

$group1.2001
  A  B  C
4 8 10 12

$group2.2000
  A B C
2 5 7 9

$group2.2001
  A B C
5 5 7 9

$group3.2000
   A  B  C
3 11 13 15

$group3.2001
   A  B  C
6 11 13 15

这篇关于R:如何使用正则表达式对列表元素进行分组和汇总?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆