取消列出数据帧中的所有列表元素 [英] Unlist all list elements in a dataframe
问题描述
我有一个数据框,每列有以下变量类型:
I have a data frame with the following classes of variables for each column:
"date" "numeric" "numeric" "list" "list" "numeric"
每行中的数据如下所示: p>
The data in each row looks like this:
1978-01-01, 12.5, 6.3, c(0,0,0.25,0.45,0.3), c(0,0,0,0.1,0.9), 72
我想将其转换为矩阵或数据框架每列有一个值,因此结果应如下所示:
I would like to transform it into a matrix or a data frame with one value per column, so the result should look like this:
1978-01-01, 12.5, 6.3, 0, 0, 0.25, 0.45, 0.3, 0, 0, 0, 0.1, 0.9, 72
I尝试使用:
j<-unlist(input)
output<-matrix(j,nrow=nrow(input),ncol=length(j)/nrow(input))
但它弄乱了订单
任何想法?
其他信息:
上面的例子略微简化, dput(h ead(input))
返回以下示例:
The above example is slightly simplified and dput(head(input))
returns the following sample:
structure(list(DATE = structure(c(2924, 2925, 2926, 2927, 2928,
2929), class = "Date"), TEMP_MEAN_M0 = c(-7.625, -7.375, -6,
-5.5, -7.625, -9.625), SLP_MEAN_M0 = c(1012.125, 991.975, 989.825,
986.675, 988.95, 993.075), WIND_DIR_RF_M0 = structure(list(`2.counts` = c(0,
0.625, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.125, 0, 0, 0, 0.125), `3.counts` = c(0.75,
0.25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `4.counts` = c(0.375,
0.125, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.125, 0.125, 0, 0, 0), `5.counts` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.125,
0, 0, 0.125, 0.375, 0.25, 0, 0, 0, 0, 0, 0, 0, 0, 0), `6.counts` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.125,
0, 0.25, 0.125, 0.25, 0.25, 0, 0, 0, 0, 0, 0, 0, 0, 0), `7.counts` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0.125, 0.5, 0.375, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("2.counts",
"3.counts", "4.counts", "5.counts", "6.counts", "7.counts")),
CEIL_HGT_RF_M0 = structure(list(`2.counts` = c(0.625, 0,
0, 0, 0, 0, 0, 0, 0, 0.375), `3.counts` = c(0.75, 0.125,
0, 0.125, 0, 0, 0, 0, 0, 0), `4.counts` = c(0.25, 0.125,
0, 0.125, 0, 0, 0, 0, 0.25, 0.25), `5.counts` = c(0, 0, 0,
0, 0, 0, 0, 0, 0.125, 0.875), `6.counts` = c(0, 0, 0, 0,
0, 0, 0, 0, 0, 1), `7.counts` = c(0, 0, 0, 0, 0, 0, 0, 0,
0, 1)), .Names = c("2.counts", "3.counts", "4.counts", "5.counts",
"6.counts", "7.counts")), WIND_SPD_MEAN_M0 = c(12.8125, 18.7375,
6.175, 8.175, 10.5375, 16.5375)), .Names = c("DATE", "TEMP_MEAN_M0",
"SLP_MEAN_M0", "WIND_DIR_RF_M0", "CEIL_HGT_RF_M0", "WIND_SPD_MEAN_M0"
), row.names = c(NA, 6L), class = "data.frame")
推荐答案
这有点凌乱,可能效率很低,但应该帮助你开始:
This is somewhat messy and probably pretty inefficient, but should help get you started:
以下是一些示例数据:
mydf <- data.frame(Date = as.Date(c("1978-01-01", "1978-01-02")),
V1 = c(10, 10),
V2 = c(11, 11))
mydf$V3 <- list(c(1:10),
c(11:20))
mydf$V4 <- list(c(21:25),
c(26:30))
mydf
# Date V1 V2 V3 V4
# 1 1978-01-01 10 11 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 21, 22, 23, 24, 25
# 2 1978-01-02 10 11 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 26, 27, 28, 29, 30
另外,一个小功能检查哪些列是列表,对于那些列, rbind
他们在一起,最终 cbind
他们与不列的列。
And, a little function that checks to see which columns are lists, and for those columns, rbind
s them together, and ultimately cbind
s them with the columns that are not lists.
myFun <- function(data) {
temp1 <- sapply(data, is.list)
temp2 <- do.call(
cbind, lapply(data[temp1], function(x)
data.frame(do.call(rbind, x), check.names=FALSE)))
cbind(data[!temp1], temp2)
}
myFun(mydf)
# Date V1 V2 V3.1 V3.2 V3.3 V3.4 V3.5 V3.6 V3.7 V3.8 V3.9 V3.10 V4.1
# 1 1978-01-01 10 11 1 2 3 4 5 6 7 8 9 10 21
# 2 1978-01-02 10 11 11 12 13 14 15 16 17 18 19 20 26
# V4.2 V4.3 V4.4 V4.5
# 1 22 23 24 25
# 2 27 28 29 30
只有每个列列表都包含相同长度的向量(否则为R $ rbind
将不起作用。
This will only work if each "column" list contain vectors of the same length (otherwise base R's rbind
will not work).
回顾这个问题一半一天之后,我看到另外一个用户( @ user1981275 )发布了一个更多的解决方案直截了当,但后来删除了他们的答案。也许他们删除,因为他们的方法将日期转换为整数,因为DWin指出,矩阵中的项必须是相同的模式。
Revisiting this question half a day later, I see that another user (@user1981275) posted a solution that is more straightforward, but then deleted their answer. Perhaps they deleted because their method converted the dates to integers since, as DWin pointed out, items in matrices must be all the same mode.
这是他们的解决方案: p>
Here was their solution:
t(apply(mydf, 1, unlist))
# Date V1 V2 V31 V32 V33 V34 V35 V36 V37 V38 V39 V310 V41 V42 V43 V44 V45
# [1,] 2922 10 11 1 2 3 4 5 6 7 8 9 10 21 22 23 24 25
# [2,] 2923 10 11 11 12 13 14 15 16 17 18 19 20 26 27 28 29 30
以下是如何轻松修改以获得所需的输出。这肯定会比以前的方法更快:
Here's how it can easily be modified to get the desired output. This will definitely be faster than the earlier approach:
cbind(mydf[!sapply(mydf, is.list)],
(t(apply(mydf[sapply(mydf, is.list)], 1, unlist))))
# Date V1 V2 V31 V32 V33 V34 V35 V36 V37 V38 V39 V310 V41 V42 V43 V44 V45
# 1 1978-01-01 10 11 1 2 3 4 5 6 7 8 9 10 21 22 23 24 25
# 2 1978-01-02 10 11 11 12 13 14 15 16 17 18 19 20 26 27 28 29 30
或者,作为用户功能:
myFun <- function(data) {
ListCols <- sapply(data, is.list)
cbind(data[!ListCols], t(apply(data[ListCols], 1, unlist)))
}
myFun(mydf)
更新2
我还写了一个更有效的函数叫code> col_flatten 这是我的SOfun软件包的一部分。
Update 2
I've also written a more efficient function called col_flatten
that's part of my "SOfun" package.
使用以下方式安装软件包:
Install the package using:
source("http://news.mrdwab.com/install_github.R")
install_github("mrdwab/SOfun")
然后,你可以做:
library(SOfun)
col_flatten(mydf, names(which(sapply(mydf, is.list))), drop = TRUE)
## Date V1 V2 V3_1 V3_2 V3_3 V3_4 V3_5 V3_6 V3_7 V3_8 V3_9 V3_10 V4_1 V4_2 V4_3 V4_4 V4_5
## 1: 1978-01-01 10 11 1 2 3 4 5 6 7 8 9 10 21 22 23 24 25
## 2: 1978-01-02 10 11 11 12 13 14 15 16 17 18 19 20 26 27 28 29 30
它基于data.table中的转置
函数,所以请确保你还有data.table。
It's based on the transpose
function in "data.table", so be sure you have "data.table" as well.
这篇关于取消列出数据帧中的所有列表元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!