组对列的逐步连接 [英] Progressive concatenation of a column by a group

查看:86
本文介绍了组对列的逐步连接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我输入以下内容:

             ID     date_1      date_2     str
1            1    2010-07-04  2008-01-20   A
2            2    2015-07-01  2011-08-31   C
3            3    2015-03-06  2013-01-18   D
4            4    2013-01-10  2011-08-30   D
5            5    2014-06-04  2011-09-18   B
6            5    2014-06-04  2011-09-18   B
7            6    2012-11-22  2011-09-28   C
8            7    2014-06-17  2013-08-04   A
10           7    2014-06-17  2013-08-04   B
11           7    2014-06-17  2013-08-04   B

我想逐步将 str 列中的组变量 ID ,如以下输出所示:

I would like to progressively concatenate the values of the str column by the group variable ID, as showed in the following output :

             ID     date_1      date_2     str
1            1    2010-07-04  2008-01-20   A
2            2    2015-07-01  2011-08-31   C
3            3    2015-03-06  2013-01-18   D
4            4    2013-01-10  2011-08-30   D
5            5    2014-06-04  2011-09-18   B
6            5    2014-06-04  2011-09-18   B,B
7            6    2012-11-22  2011-09-28   C
8            7    2014-06-17  2013-08-04   A
10           7    2014-06-17  2013-08-04   A,B
11           7    2014-06-17  2013-08-04   A,B,B

我尝试将 ave()函数与以下代码配合使用:

I tried to use the ave() function with this code :

within(table, {
  Emp_list <- ave(str, ID, FUN = function(x) paste(x, collapse = ","))
})

但它给出以下输出,这不完全是我要:

but it gives the following output, which is not exactly what I want :

         ID      date_1     date_2      str
1         1    2010-07-04 2008-01-20     A
2         2    2015-07-01 2011-08-31     C
3         3    2015-03-06 2013-01-18     D
4         4    2013-01-10 2011-08-30     D
5         5    2014-06-04 2011-09-18     B,B
6         5    2014-06-04 2011-09-18     B,B
7         6    2012-11-22 2011-09-28     C
8         7    2014-06-17 2013-08-04     A,B,B
10        7    2014-06-17 2013-08-04     A,B,B
11        7    2014-06-17 2013-08-04     A,B,B

我当然想避免循环,因为我在大型数据库上工作。

Of course I'd like to avoid loops, as I work on a large database.

推荐答案

ave()怎么样? Reduce() Reduce()函数允许我们在计算结果时对其进行累加。因此,如果我们使用 paste()运行它,则可以累积粘贴的字符串。

How about ave() with Reduce(). The Reduce() function allows us to accumulate results as they are calculated. So if we run it with paste() we can accumulate the pasted strings.

f <- function(x) {
    Reduce(function(...) paste(..., sep = ", "), x, accumulate = TRUE)
}

df$str <- with(df, ave(as.character(str), ID, FUN = f)

给出更新的数据框 df

   ID     date_1     date_2     str
1   1 2010-07-04 2008-01-20       A
2   2 2015-07-01 2011-08-31       C
3   3 2015-03-06 2013-01-18       D
4   4 2013-01-10 2011-08-30       D
5   5 2014-06-04 2011-09-18       B
6   5 2014-06-04 2011-09-18    B, B
7   6 2012-11-22 2011-09-28       C
8   7 2014-06-17 2013-08-04       A
10  7 2014-06-17 2013-08-04    A, B
11  7 2014-06-17 2013-08-04 A, B, B

注意: function(...)paste(...,sep =,)也可以是 function(x,y)paste(x,y,sep = ,)。 (感谢Pierre Lafortune)

Note: function(...) paste(..., sep = ", ") could also be function(x, y) paste(x, y, sep = ", "). (Thanks Pierre Lafortune)

这篇关于组对列的逐步连接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆