连接数据框的行 [英] Concatenate rows of a data frame

查看:65
本文介绍了连接数据框的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想获取一个包含字符和数字的数据框,并将每行的所有元素连接到单个字符串中,并将其作为单个元素存储在向量中。例如,我制作了一个由字母和数字组成的数据框,然后我想通过粘贴功能连接第一行,并希望返回值 A1。

I would like to take a data frame with characters and numbers, and concatenate all of the elements of the each row into a single string, which would be stored as a single element in a vector. As an example, I make a data frame of letters and numbers, and then I would like to concatenate the first row via the paste function, and hopefully return the value "A1"

df <- data.frame(letters = LETTERS[1:5], numbers = 1:5)
df

##   letters numbers
## 1       A       1
## 2       B       2
## 3       C       3
## 4       D       4
## 5       E       5

paste(df[1,], sep =".")
## [1] "1" "1"

因此,粘贴会将行中的每个元素转换为与相应级别的索引相对应的整数,就好像它是一个因子一样,并使其保持长度向量二。 (我知道/相信被强迫为字符的因素会以这种方式运行,但是由于R根本没有将df [1,]作为因素存储(由is.factor()测试,因此我无法验证它实际上是某个级别的索引)

So paste is converting each element of the row into an integer that corresponds to the 'index of the corresponding level' as if it were a factor, and it keeps it a vector of length two. (I know/believe that factors that are coerced to be characters behave in this way, but as R is not storing df[1,] as a factor at all (tested by is.factor(), I can't verify that it is actually an index for a level)

is.factor(df[1,])
## [1] FALSE
is.vector(df[1,])
## [1] FALSE

因此,如果它不是向量,那么它表现得很奇怪,但是我不能强迫它成为向量

So if it is not a vector then it makes sense that it is behaving oddly, but I can't coerce it into a vector

> is.vector(as.vector(df[1,]))
[1] FALSE

使用字符似乎对我的尝试没有帮助

Using as.character did not seem to help in my attempts

有人可以解释这种现象吗?

Can anyone explain this behavior?

推荐答案

虽然其他人都在关注您的代码为何不起作用以及如何对其进行改进,但我我将尝试着重于获得所需的结果。从您的描述看来,使用粘贴似乎可以轻松实现所需的结果:

While others have focused on why your code isn't working and how to improve it, I'm going to try and focus more on getting the result you want. From your description, it seems you can readily achieve what you want using paste:

df <- data.frame(letters = LETTERS[1:5], numbers = 1:5, stringsAsFactors=FALSE)
paste(df$letters, df$numbers, sep=""))

## [1] "A1" "B2" "C3" "D4" "E5"

您可以使用 df $ letters<-as.character(df $ letters) df $ letters 更改为字符您不想使用 stringsAsFactors 参数。

You can change df$letters to character using df$letters <- as.character(df$letters) if you don't want to use the stringsAsFactors argument.

但是让我们假设这不是您想要的。假设您有数百列,并且希望将它们全部粘贴在一起。我们也可以使用您的最小示例来做到这一点:

But let's assume that's not what you want. Let's assume you have hundreds of columns and you want to paste them all together. We can do that with your minimal example too:

df_args <- c(df, sep="")
do.call(paste, df_args)

## [1] "A1" "B2" "C3" "D4" "E5"



编辑:替代方法和解释:



我意识到您遇到的问题是您正在使用因子并且使用了 sep 自变量而不是崩溃的事实(如@ adibender拾起)。区别在于 sep 给出两个单独向量之间的分隔符,而 collapse 给出向量内的分隔符。当您使用 df [1,] 时,您要为 paste 提供单个矢量,因此必须使用崩溃参数。使用获取每一行并连接它们的想法,下面的代码行将完全满足您的要求:

Alternative method and explanation:

I realised the problem you're having is a combination of the fact that you're using a factor and that you're using the sep argument instead of collapse (as @adibender picked up). The difference is that sep gives the separator between two separate vectors and collapse gives separators within a vector. When you use df[1,], you supply a single vector to paste and hence you must use the collapse argument. Using your idea of getting every row and concatenating them, the following line of code will do exactly what you want:

apply(df, 1, paste, collapse="")

好,现在进行解释:

为什么 as.list 不起作用?

Why won't as.list work?

as.list 将对象转换为列表。这样就可以了。它将您的数据框转换为列表,然后忽略 sep = 参数。 c 将对象组合在一起。从技术上讲,数据帧只是一个列表,其中每一列都是一个元素,所有元素必须具有相同的长度。因此,当我将其与 sep = 结合使用时,它将变成一个以数据框的列为元素的常规列表。

as.list converts an object to a list. So it does work. It will convert your dataframe to a list and subsequently ignore the sep="" argument. c combines objects together. Technically, a dataframe is just a list where every column is an element and all elements have to have the same length. So when I combine it with sep="", it just becomes a regular list with the columns of the dataframe as elements.

为什么使用 do.call

Why use do.call?

do.call 允许您使用命名列表作为参数来调用函数。您不能将列表直接放入粘贴,因为它不喜欢数据框。它是专为连接向量而设计的。因此请记住, dfargs 是一个包含字母向量,数字向量和sep的列表,sep是仅包含的长度为1的向量。当我使用 do.call 时,生成的粘贴函数本质上是 paste(字母,数字,sep)。 >
但是,如果我的原始数据框具有列 letters, numbers, squigs, blargs 会怎样,然后像我一样添加分隔符之前?然后通过 do.call 粘贴功能将如下所示:

do.call allows you to call a function using a named list as its arguments. You can't just throw the list straight into paste, because it doesn't like dataframes. It's designed for concatenating vectors. So remember that dfargs is a list containing a vector of letters, a vector of numbers and sep which is a length 1 vector containing only "". When I use do.call, the resulting paste function is essentially paste(letters, numbers, sep).
But what if my original dataframe had columns "letters", "numbers", "squigs", "blargs" after which I added the separator like I did before? Then the paste function through do.call would look like:

paste(letters, numbers, squigs, blargs, sep)

所以您看到它适用于任意数量的列

So you see it works for any number of columns.

这篇关于连接数据框的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆