R:用c聚合字符串 [英] R: Aggregate character strings with c
问题描述
我有一个包含两列的数据框:一列是字符串,另一列是整数。
I have a data frame with two columns: one is strings, the other one is integers.
> rnames = sapply(1:20, FUN=function(x) paste("item", x, sep="."))
> x <- sample(c(1:5), 20, replace = TRUE)
> df <- data.frame(x, rnames)
> df
x rnames
1 5 item.1
2 3 item.2
3 5 item.3
4 3 item.4
5 1 item.5
6 3 item.6
7 4 item.7
8 5 item.8
9 4 item.9
10 5 item.10
11 5 item.11
12 2 item.12
13 2 item.13
14 1 item.14
15 3 item.15
16 4 item.16
17 5 item.17
18 4 item.18
19 1 item.19
20 1 item.20
我正在尝试使用'c'或'list'函数将字符串聚合到列表(或字符串向量)中,但得到奇怪的结果:
I'm trying to aggregate the strings into list or vectors of strings (characters) with the 'c' or the 'list' function, but getting weird results:
> aggregate(rnames ~ x, df, c)
x rnames
1 1 16, 6, 11, 13
2 2 4, 5
3 3 12, 15, 17, 7
4 4 18, 20, 8, 10
5 5 1, 14, 19, 2, 3, 9
当我使用'paste'而不是'c'时,我可以看到聚合正常工作-但是结果不是我想要的。
When I use 'paste' instead of 'c', I can see that the aggregate is working correctly - but the result is not what I'm looking for.
> aggregate(rnames ~ x, df, paste)
x rnames
1 1 item.5, item.14, item.19, item.20
2 2 item.12, item.13
3 3 item.2, item.4, item.6, item.15
4 4 item.7, item.9, item.16, item.18
5 5 item.1, item.3, item.8, item.10, item.11, item.17
什么我正在寻找的是每个聚合组都将以矢量或照明形式出现(因此使用c),而不是我通过粘贴获得的单个字符串。类似于以下内容(实际上不起作用):
What I'm looking for is that every aggregated group would be presented as a vector or a lit (hence the use of c) as opposed to the single string I'm getting with 'paste'. Something along the lines of the following (which in reality doesn't work):
> aggregate(rnames ~ x, df, c)
x rnames
1 1 item.5, item.14, item.19, item.20
2 2 item.12, item.13
3 3 item.2, item.4, item.6, item.15
4 4 item.7, item.9, item.16, item.18
5 5 item.1, item.3, item.8, item.10, item.11, item.17
任何帮助将不胜感激。
Any help would be appreciated.
推荐答案
您陷入了 data.frame
的常见陷阱:字符列不是字符列,而是因子列!因此,结果中的数字而不是字符:
You fell in the usual trap of data.frame
: your character column is not a character column, it is a factor column! Hence the numbers instead of the characters in your result:
> rnames = sapply(1:20, FUN=function(x) paste("item", x, sep="."))
> x <- sample(c(1:5), 20, replace = TRUE)
> df <- data.frame(x, rnames)
> str(df)
'data.frame': 20 obs. of 2 variables:
$ x : int 2 5 5 5 5 4 3 3 2 4 ...
$ rnames: Factor w/ 20 levels "item.1","item.10",..: 1 12 14 15 16 17 18 19 20 2 ...
防止转换为因子,在调用 data.frame
时使用参数 stringAsFactors = FALSE
:
To prevent the conversion to factors, use argument stringAsFactors=FALSE
in your call to data.frame
:
> df <- data.frame(x, rnames,stringsAsFactors=FALSE)
> str(df)
'data.frame': 20 obs. of 2 variables:
$ x : int 5 5 3 5 5 3 2 5 1 5 ...
$ rnames: chr "item.1" "item.2" "item.3" "item.4" ...
> aggregate(rnames ~ x, df, c)
x rnames
1 1 item.9, item.13, item.17
2 2 item.7
3 3 item.3, item.6, item.19
4 4 item.12, item.15, item.16
5 5 item.1, item.2, item.4, item.5, item.8, item.10, item.11, item.14, item.18, item.20
另一个避免转换为因子的解决方案是函数 I
:
Another solution to avoid the conversion to factor is function I
:
> df <- data.frame(x, I(rnames))
> str(df)
'data.frame': 20 obs. of 2 variables:
$ x : int 3 5 4 5 4 5 3 3 1 1 ...
$ rnames:Class 'AsIs' chr [1:20] "item.1" "item.2" "item.3" "item.4" ...
摘录自?I
:
在函数data.frame中。通过将对象封装在
a调用data.frame的I()中来保护对象,可防止将字符向量转换为
因数,并抑制名称的丢失,并确保将矩阵作为单个插入的
列。我也可以用来保护要添加到数据框或通过as.data.frame转换为数据框
的对象
。
In function data.frame. Protecting an object by enclosing it in I() in a call to data.frame inhibits the conversion of character vectors to factors and the dropping of names, and ensures that matrices are inserted as single columns. I can also be used to protect objects which are to be added to a data frame, or converted to a data frame via as.data.frame.
它通过在对象的
类之前添加 AsIs类来实现。 AsIs类有一些自己的方法,包括[,
as.data.frame,打印和格式化。
It achieves this by prepending the class "AsIs" to the object's classes. Class "AsIs" has a few of its own methods, including for [, as.data.frame, print and format.
这篇关于R:用c聚合字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!