R:用c聚合字符串 [英] R: Aggregate character strings with c

查看:107
本文介绍了R:用c聚合字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含两列的数据框:一列是字符串,另一列是整数。

I have a data frame with two columns: one is strings, the other one is integers.

> rnames = sapply(1:20, FUN=function(x) paste("item", x, sep="."))
> x <- sample(c(1:5), 20, replace = TRUE)
> df <- data.frame(x, rnames)
> df
   x  rnames
1  5  item.1
2  3  item.2
3  5  item.3
4  3  item.4
5  1  item.5
6  3  item.6
7  4  item.7
8  5  item.8
9  4  item.9
10 5 item.10
11 5 item.11
12 2 item.12
13 2 item.13
14 1 item.14
15 3 item.15
16 4 item.16
17 5 item.17
18 4 item.18
19 1 item.19
20 1 item.20

我正在尝试使用'c'或'list'函数将字符串聚合到列表(或字符串向量)中,但得到奇怪的结果:

I'm trying to aggregate the strings into list or vectors of strings (characters) with the 'c' or the 'list' function, but getting weird results:

> aggregate(rnames ~ x, df, c)
  x             rnames
1 1      16, 6, 11, 13
2 2               4, 5
3 3      12, 15, 17, 7
4 4      18, 20, 8, 10
5 5 1, 14, 19, 2, 3, 9

当我使用'paste'而不是'c'时,我可以看到聚合正常工作-但是结果不是我想要的。

When I use 'paste' instead of 'c', I can see that the aggregate is working correctly - but the result is not what I'm looking for.

> aggregate(rnames ~ x, df, paste)
  x                                            rnames
1 1                 item.5, item.14, item.19, item.20
2 2                                  item.12, item.13
3 3                   item.2, item.4, item.6, item.15
4 4                  item.7, item.9, item.16, item.18
5 5 item.1, item.3, item.8, item.10, item.11, item.17

什么我正在寻找的是每个聚合组都将以矢量或照明形式出现(因此使用c),而不是我通过粘贴获得的单个字符串。类似于以下内容(实际上不起作用):

What I'm looking for is that every aggregated group would be presented as a vector or a lit (hence the use of c) as opposed to the single string I'm getting with 'paste'. Something along the lines of the following (which in reality doesn't work):

> aggregate(rnames ~ x, df, c)
  x                                            rnames
1 1                 item.5, item.14, item.19, item.20
2 2                                  item.12, item.13
3 3                   item.2, item.4, item.6, item.15
4 4                  item.7, item.9, item.16, item.18
5 5 item.1, item.3, item.8, item.10, item.11, item.17

任何帮助将不胜感激。

Any help would be appreciated.

推荐答案

您陷入了 data.frame 的常见陷阱:字符列不是字符列,而是因子列!因此,结果中的数字而不是字符:

You fell in the usual trap of data.frame: your character column is not a character column, it is a factor column! Hence the numbers instead of the characters in your result:

> rnames = sapply(1:20, FUN=function(x) paste("item", x, sep="."))
> x <- sample(c(1:5), 20, replace = TRUE)
> df <- data.frame(x, rnames)
> str(df)
'data.frame':   20 obs. of  2 variables:
 $ x     : int  2 5 5 5 5 4 3 3 2 4 ...
 $ rnames: Factor w/ 20 levels "item.1","item.10",..: 1 12 14 15 16 17 18 19 20 2 ...

防止转换为因子,在调用 data.frame 时使用参数 stringAsFactors = FALSE

To prevent the conversion to factors, use argument stringAsFactors=FALSE in your call to data.frame:

> df <- data.frame(x, rnames,stringsAsFactors=FALSE)
> str(df)
'data.frame':   20 obs. of  2 variables:
 $ x     : int  5 5 3 5 5 3 2 5 1 5 ...
 $ rnames: chr  "item.1" "item.2" "item.3" "item.4" ...
> aggregate(rnames ~ x, df, c)
  x                                                                              rnames
1 1                                                            item.9, item.13, item.17
2 2                                                                              item.7
3 3                                                             item.3, item.6, item.19
4 4                                                           item.12, item.15, item.16
5 5 item.1, item.2, item.4, item.5, item.8, item.10, item.11, item.14, item.18, item.20

另一个避免转换为因子的解决方案是函数 I

Another solution to avoid the conversion to factor is function I:

> df <- data.frame(x, I(rnames))
> str(df)
'data.frame':   20 obs. of  2 variables:
 $ x     : int  3 5 4 5 4 5 3 3 1 1 ...
 $ rnames:Class 'AsIs'  chr [1:20] "item.1" "item.2" "item.3" "item.4" ...

摘录自?I


在函数data.frame中。通过将对象封装在
a调用data.frame的I()中来保护对象,可防止将字符向量转换为
因数,并抑制名称的丢失,并确保将矩阵作为单个插入的
列。我也可以用来保护要添加到数据框或通过as.data.frame转换为数据框
的对象

In function data.frame. Protecting an object by enclosing it in I() in a call to data.frame inhibits the conversion of character vectors to factors and the dropping of names, and ensures that matrices are inserted as single columns. I can also be used to protect objects which are to be added to a data frame, or converted to a data frame via as.data.frame.

它通过在对象的
类之前添加 AsIs类来实现。 AsIs类有一些自己的方法,包括[,
as.data.frame,打印和格式化。

It achieves this by prepending the class "AsIs" to the object's classes. Class "AsIs" has a few of its own methods, including for [, as.data.frame, print and format.

这篇关于R:用c聚合字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆