自动堆叠数据帧的每第 n 列 [英] Automatically stack every nth column of a dataframe
问题描述
我有一个名为 DF 的日期框架,其中包含三个循环重复的变量:
I have a date frame called DF with, say, three variables that repeat each other cyclically:
A B C A B C
1 a1 b1 c1 a5 b5 c5
2 a2 b2 c2 a6 b6 c6
3 a3 b3 c3 a7 b7 c7
4 a4 b4 c4 a8 b8 c8
我想将第一个 A 列堆叠在第二个 A 列上(以及第三、第四等,如果存在),并对其他变量执行相同操作,然后将结果保存为新对象(例如,作为向量).所以我想得到的是
I want to stack the first A column on the second A column (and on the third, and fourth and so on, if they exist), and do the same with the other variables, and then save the result as new objects (as vectors, for example). So what I want to obtain is
V_A <- c(a1,a2,a3,a4,a5,a6,a7,a8)
V_B <- c(b1,b2,b3,b4,b5,b6,b7,b8)
V_C <- c(c1,c2,c3,c4,c5,c6,c7,c8)
虽然手动完成很容易,就像这样
While it's very easy to do it manually, like this
V_A <- DF[,seq(1, ncol(DF), 3]
V_A <- stack(DF)
V_B <- DF[,seq(2, ncol(DF), 3]
V_B <- stack(DF)
V_C <- DF[,seq(3, ncol(DF), 3]
V_C <- stack(DF)
我正在寻找的是一种自动执行此操作的代码,以便它可以用于具有各种变量的数据帧,而无需每次都编写临时代码.总结一下,代码应该:1) 选择数据框中的每第 n 列2)堆叠此列3) 将结果保存在自动创建的新对象中
what I'm looking for is a code that does this automatically, so that it will work for data frames with every number of variables without having to write ad-hoc codes every time. To sum up, the code should: 1) select every nth column in the data frame 2) stack this columns 3) save the result in new objects automatically created
我觉得一定有办法做到这一点,但到目前为止我还没有成功.非常感谢.
I feel there must be a way to do this but I haven't succeeded so far. Thanks very much in advance.
编辑假设我处于稍微不同的情况,其中列重复但名称不完全相同,我仍然想做同样的事情.所以我有:
EDIT Let's say I am in a slightly different situation, in which the columns repeat but not with exactly the same name, and I still want to do the same thing. So I have:
A1 B1 C1 A2 B2 C2
1 a11 b11 c11 a25 b25 c25
2 a12 b12 c12 a26 b26 c26
3 a13 b13 c13 a27 b27 c27
4 a14 b14 c14 a28 b28 c28
我想要:
V_A <- c(a11,a12,a13,a14,a25,a26,a27,a28)
V_B <- c(b11,b12,b13,b14,b25,b26,b27,b28)
V_C <- c(c11,c12,c13,c14,c25,c26,c27,c28)
我该怎么做?
推荐答案
这里有一些替代方案.不使用任何包.
Here are some alternatives. No packages are used.
1) aperm 创建一个 3d 数组 a
,排列维度并重塑为矩阵 m
,然后将其转换为数据框.这个只有在所有值都是相同类型时才有效.(2) 和 (3) 没有这个限制.
1) aperm Create a 3d array a
, permute the dimensions and reshape into a matrix m
and then convert that to a data frame. This one only works if all values are of the same type. (2) and (3) do not have this limitation.
k <- 3
nr <- nrow(DF)
nc <- ncol(DF)
unames <- unique(names(DF))
a <- array(as.matrix(DF), c(nr, k, nc/k))
m <- matrix(aperm(a, c(1, 3, 2)),, k, dimnames = list(NULL, unames))
as.data.frame(m, stringsAsFactors = FALSE)
给予:
A B C
1 a1 b1 c1
2 a2 b2 c2
3 a3 b3 c3
4 a4 b4 c4
5 a5 b5 c5
6 a6 b6 c6
7 a7 b7 c7
8 a8 b8 c8
如果我们处于问题的 EDIT 中给出的情况,则将 unames
替换为以下内容,其中 DF2 是 DF,并按照最后的注释修改名称:
If we are in the situation given in the question's EDIT then replace unames
with the following where DF2 is DF with the revised names as per Note at end:
unames <- unique(sub("\\d*$", "", names(DF2)))
2) lapply 这概括了问题中的代码.unames
定义在上面:
2) lapply This generalizes the code in the question. unames
is defined above:
L <- lapply(split(as.list(DF), names(DF)), unlist)
as.data.frame(L, stringsAsFactors = FALSE)
给予:
A B C
1 a1 b1 c1
2 a2 b2 c2
3 a3 b3 c3
4 a4 b4 c4
5 a5 b5 c5
6 a6 b6 c6
7 a7 b7 c7
8 a8 b8 c8
使用问题的 EDIT 中显示的输入,可以这样做,其中 DF2
在最后的注释中可重复给出.
With the input shown in the question's EDIT it could be done like this where DF2
is given reproducibly in the Note at the end.
names0 <- sub("\\d*$", "", names(DF2)) # names without the trailing digits
L <- lapply(split(as.list(DF2), names0), unlist)
as.data.frame(L, stringsAsFactors = FALSE)
3) reshape nc
和 unames
来自上面.variing
是一个包含 k
个分量的列表,例如第 i 个分量包含索引向量 c(i, i+k, ...)
.看起来 reshape
不喜欢重复的名字,所以我们给它 setNames(DF, 1:nc)
作为输入.该解决方案的优点是还可以生成索引向量 time
和 id
,它们将输出与输入数据相关联.
3) reshape nc
and unames
are from above. varying
is a list with k
components such as that the ith component contains the index vector c(i, i+k, ...)
. It seems that reshape
does not like duplicated names so we have given it setNames(DF, 1:nc)
as the input. This solution does have the advantage of also generating the index vectors time
and id
which relate the output to the input data.
varying <- split(1:nc, names(DF))
reshape(setNames(DF, 1:nc), dir = "long", varying = varying, v.names = unames)
给予:
time A B C id
1.1 1 a1 b1 c1 1
2.1 1 a2 b2 c2 2
3.1 1 a3 b3 c3 3
4.1 1 a4 b4 c4 4
1.2 2 a5 b5 c5 1
2.2 2 a6 b6 c6 2
3.2 2 a7 b7 c7 3
4.2 2 a8 b8 c8 4
问题的编辑中显示的输入实际上简化了.我们不再需要使用 setNames(DF, 1:nc)
而是可以直接使用数据框作为输入.此外,我们可以使用 variing=TRUE
(另见@thelatemail 的评论)而不是为 variing
计算复杂的参数.输入DF2
如最后的Note所示,names0
如上面的(2)所示.
With the input shown in the question's EDIT it actually simplifies. We no longer need to use setNames(DF, 1:nc)
but can just use the data frame as is as input. Also, we can use varying=TRUE
(also see @thelatemail's comment) instead of calculating a complex argument for varying
. The input DF2
is as shown in the Note at the end and names0
is as in (2) above.
reshape(DF2, dir = "long", varying = TRUE, v.names = unique(names0))
注意:
Lines <- " A B C A B C
1 a1 b1 c1 a5 b5 c5
2 a2 b2 c2 a6 b6 c6
3 a3 b3 c3 a7 b7 c7
4 a4 b4 c4 a8 b8 c8"
DF <- read.table(text = Lines, as.is = TRUE, check.names = FALSE)
DF2 <- setNames(DF, c("A1", "B1", "C1", "A2", "B2", "C2")) # test input
更新:许多简化.还在最后的 Note 中添加了 DF2
并在每个替代方案中讨论如何修改代码来处理它.(一种通用方法可能只是将 DF2 减少到 DF,正如我在下面的评论中所讨论的那样.)
Upate: A number of simplifications. Also added DF2
in Note at end and discuss in each alternative how to modify the code to deal with it. (A general method might be just to reduce DF2 to DF as I discussed in the comments below.)
这篇关于自动堆叠数据帧的每第 n 列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!