自动堆叠数据帧的每第 n 列 [英] Automatically stack every nth column of a dataframe

查看：35 发布时间：2021/8/28 18:37:17 r dataframe stack subset

本文介绍了自动堆叠数据帧的每第 n 列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个名为 DF 的日期框架，其中包含三个循环重复的变量:

I have a date frame called DF with, say, three variables that repeat each other cyclically:

      A      B      C      A      B      C 
1    a1     b1     c1     a5     b5     c5
2    a2     b2     c2     a6     b6     c6
3    a3     b3     c3     a7     b7     c7
4    a4     b4     c4     a8     b8     c8

我想将第一个 A 列堆叠在第二个 A 列上(以及第三、第四等，如果存在)，并对其他变量执行相同操作，然后将结果保存为新对象(例如，作为向量).所以我想得到的是

I want to stack the first A column on the second A column (and on the third, and fourth and so on, if they exist), and do the same with the other variables, and then save the result as new objects (as vectors, for example). So what I want to obtain is

V_A <- c(a1,a2,a3,a4,a5,a6,a7,a8)
V_B <- c(b1,b2,b3,b4,b5,b6,b7,b8)
V_C <- c(c1,c2,c3,c4,c5,c6,c7,c8)

虽然手动完成很容易，就像这样

While it's very easy to do it manually, like this

V_A <- DF[,seq(1, ncol(DF), 3]
V_A <- stack(DF)
V_B <- DF[,seq(2, ncol(DF), 3]
V_B <- stack(DF)
V_C <- DF[,seq(3, ncol(DF), 3]
V_C <- stack(DF)

我正在寻找的是一种自动执行此操作的代码，以便它可以用于具有各种变量的数据帧，而无需每次都编写临时代码.总结一下，代码应该:1) 选择数据框中的每第 n 列2)堆叠此列3) 将结果保存在自动创建的新对象中

what I'm looking for is a code that does this automatically, so that it will work for data frames with every number of variables without having to write ad-hoc codes every time. To sum up, the code should: 1) select every nth column in the data frame 2) stack this columns 3) save the result in new objects automatically created

我觉得一定有办法做到这一点，但到目前为止我还没有成功.非常感谢.

I feel there must be a way to do this but I haven't succeeded so far. Thanks very much in advance.

编辑假设我处于稍微不同的情况，其中列重复但名称不完全相同，我仍然想做同样的事情.所以我有:

EDIT Let's say I am in a slightly different situation, in which the columns repeat but not with exactly the same name, and I still want to do the same thing. So I have:

     A1      B1      C1      A2      B2      C2 
1    a11     b11     c11     a25     b25     c25
2    a12     b12     c12     a26     b26     c26
3    a13     b13     c13     a27     b27     c27
4    a14     b14     c14     a28     b28     c28

我想要:

V_A <- c(a11,a12,a13,a14,a25,a26,a27,a28)
V_B <- c(b11,b12,b13,b14,b25,b26,b27,b28)
V_C <- c(c11,c12,c13,c14,c25,c26,c27,c28)

我该怎么做?

推荐答案

这里有一些替代方案.不使用任何包.

Here are some alternatives. No packages are used.

1) aperm 创建一个 3d 数组 a，排列维度并重塑为矩阵 m，然后将其转换为数据框.这个只有在所有值都是相同类型时才有效.(2) 和 (3) 没有这个限制.

1) aperm Create a 3d array a, permute the dimensions and reshape into a matrix m and then convert that to a data frame. This one only works if all values are of the same type. (2) and (3) do not have this limitation.

k <- 3
nr <- nrow(DF)
nc <- ncol(DF)
unames <- unique(names(DF))

a <- array(as.matrix(DF), c(nr, k, nc/k))
m <- matrix(aperm(a, c(1, 3, 2)),, k, dimnames = list(NULL, unames))
as.data.frame(m, stringsAsFactors = FALSE)

给予:

   A  B  C
1 a1 b1 c1
2 a2 b2 c2
3 a3 b3 c3
4 a4 b4 c4
5 a5 b5 c5
6 a6 b6 c6
7 a7 b7 c7
8 a8 b8 c8

如果我们处于问题的 EDIT 中给出的情况，则将 unames 替换为以下内容，其中 DF2 是 DF，并按照最后的注释修改名称:

If we are in the situation given in the question's EDIT then replace unames with the following where DF2 is DF with the revised names as per Note at end:

unames <- unique(sub("\\d*$", "", names(DF2)))

2) lapply 这概括了问题中的代码.unames 定义在上面:

2) lapply This generalizes the code in the question. unames is defined above:

L <- lapply(split(as.list(DF), names(DF)), unlist)
as.data.frame(L, stringsAsFactors = FALSE)

给予:

   A  B  C
1 a1 b1 c1
2 a2 b2 c2
3 a3 b3 c3
4 a4 b4 c4
5 a5 b5 c5
6 a6 b6 c6
7 a7 b7 c7
8 a8 b8 c8

使用问题的 EDIT 中显示的输入，可以这样做，其中 DF2 在最后的注释中可重复给出.

With the input shown in the question's EDIT it could be done like this where DF2 is given reproducibly in the Note at the end.

names0 <- sub("\\d*$", "", names(DF2))   # names without the trailing digits
L <- lapply(split(as.list(DF2), names0), unlist)
as.data.frame(L, stringsAsFactors = FALSE)

3) reshape nc 和 unames 来自上面.variing 是一个包含 k 个分量的列表，例如第 i 个分量包含索引向量 c(i, i+k, ...).看起来 reshape 不喜欢重复的名字，所以我们给它 setNames(DF, 1:nc) 作为输入.该解决方案的优点是还可以生成索引向量 time 和 id，它们将输出与输入数据相关联.

3) reshape nc and unames are from above. varying is a list with k components such as that the ith component contains the index vector c(i, i+k, ...). It seems that reshape does not like duplicated names so we have given it setNames(DF, 1:nc) as the input. This solution does have the advantage of also generating the index vectors time and id which relate the output to the input data.

varying <- split(1:nc, names(DF))
reshape(setNames(DF, 1:nc), dir = "long", varying = varying, v.names = unames)

给予:

    time  A  B  C id
1.1    1 a1 b1 c1  1
2.1    1 a2 b2 c2  2
3.1    1 a3 b3 c3  3
4.1    1 a4 b4 c4  4
1.2    2 a5 b5 c5  1
2.2    2 a6 b6 c6  2
3.2    2 a7 b7 c7  3
4.2    2 a8 b8 c8  4

问题的编辑中显示的输入实际上简化了.我们不再需要使用 setNames(DF, 1:nc) 而是可以直接使用数据框作为输入.此外，我们可以使用 variing=TRUE(另见@thelatemail 的评论)而不是为 variing 计算复杂的参数.输入DF2如最后的Note所示，names0如上面的(2)所示.

With the input shown in the question's EDIT it actually simplifies. We no longer need to use setNames(DF, 1:nc) but can just use the data frame as is as input. Also, we can use varying=TRUE (also see @thelatemail's comment) instead of calculating a complex argument for varying. The input DF2 is as shown in the Note at the end and names0 is as in (2) above.

reshape(DF2, dir = "long", varying = TRUE, v.names = unique(names0))

注意:

Lines <- "      A      B      C      A      B      C 
1    a1     b1     c1     a5     b5     c5
2    a2     b2     c2     a6     b6     c6
3    a3     b3     c3     a7     b7     c7
4    a4     b4     c4     a8     b8     c8"
DF <- read.table(text = Lines, as.is = TRUE, check.names = FALSE)

DF2 <- setNames(DF, c("A1", "B1", "C1", "A2", "B2", "C2")) # test input

更新:许多简化.还在最后的 Note 中添加了 DF2 并在每个替代方案中讨论如何修改代码来处理它.(一种通用方法可能只是将 DF2 减少到 DF，正如我在下面的评论中所讨论的那样.)

Upate: A number of simplifications. Also added DF2 in Note at end and discuss in each alternative how to modify the code to deal with it. (A general method might be just to reduce DF2 to DF as I discussed in the comments below.)

这篇关于自动堆叠数据帧的每第 n 列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

自动堆叠数据帧的每第 n 列 [英] Automatically stack every nth column of a dataframe

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

自动堆叠数据帧的每第 n 列 [英] Automatically stack every nth column of a dataframe

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭