在单个列中汇总多个值 [英] Aggregate multiple values in a single colum r

查看:87
本文介绍了在单个列中汇总多个值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据集

data <- cbind(c(1,1,1,2,3,3,3,4,4,5,5,5,5),
c(1112,1164,1339,395,1297,1440,1944,217,625,561,765,1022,1252))

我希望结果看起来像理想状态,分为两列,而不是每个值一列

I would like the result to look like this idealy in two columns rather than one column per value

        [,1] [,2]
 [1,]    1 1112,1164,1339
 [2,]    2  395
 [3,]    3 1297,1440,1944
 [4,]    4  217,625
 [5,]    5  561,765,1022,1252


推荐答案

由于第二列不是数字,因此无法在数字矩阵中使用它。这些数字将必须转换为字符串,并将整个内容存储在数据框中。

Since your second column isn't a number you can't have this in a numerical matrix. The numbers will have to be converted to strings and the whole thing stored in a data frame.

> d=data.frame(data)
> d
   X1   X2
1   1 1112
2   1 1164
3   1 1339
4   2  395
5   3 1297
6   3 1440
[etc]

现在我们只使用 dplyr 并将每个X1类别中的X2值粘贴在一起:

Now we just use dplyr and paste together the X2 values in each X1 category:

> require(dplyr)
> d %>% group_by(X1) %>% summarise(X2=paste(X2,collapse=","))
Source: local data frame [5 x 2]

  X1                X2
1  1    1112,1164,1339
2  2               395
3  3    1297,1440,1944
4  4           217,625
5  5 561,765,1022,1252

请注意,X2中保留的是字符串 1112,1164,1339,因此返回数字值后,您需要在逗号上分割字符串并将其转换为数字。

Note that what is being held in X2 is a string "1112,1164,1339", so to get the numeric values back out you need to split the string on the comma and convert to numeric.

我只会使用这种转换来显示数据,它不是用于进一步处理的有用格式。

I would only use this conversion for displaying the data, its not a useful format for further processing.

可以在一个列的元素中存储多个值,但是我一直发现它破坏了某些函数对数据框中可能存在的值的期望...

You can store multiple values in elements of a column, but I've always found it breaks some functions' expectations of what can be in a data frame...

因此从 d 开始,您可以执行以下操作:

So starting with d you can do:

> dwide = data.frame(X1=unique(d$X1), X2=tapply(d$X2, factor(d$X1),c))
> dwide
  X1                   X2
1  1     1112, 1164, 1339
2  2                  395
3  3     1297, 1440, 1944
4  4             217, 625
5  5 561, 765, 1022, 1252

然后您可以直接访问数值元素,但请确保您会得到正确的方括号数量:

And then you can access the numerical elements directly, but make sure you get the number of square brackets right:

> dwide$X2[[3]][2]
[1] 1440

这篇关于在单个列中汇总多个值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆