取消列出数据帧中的所有列表元素 [英] Unlist all list elements in a dataframe

查看:80
本文介绍了取消列出数据帧中的所有列表元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,每列有以下变量类型:

I have a data frame with the following classes of variables for each column:

"date" "numeric" "numeric" "list" "list" "numeric"

每行中的数据如下所示: p>

The data in each row looks like this:

    1978-01-01, 12.5, 6.3, c(0,0,0.25,0.45,0.3), c(0,0,0,0.1,0.9), 72

我想将其转换为矩阵或数据框架每列有一个值,因此结果应如下所示:

I would like to transform it into a matrix or a data frame with one value per column, so the result should look like this:

1978-01-01, 12.5, 6.3, 0, 0, 0.25, 0.45, 0.3, 0, 0, 0, 0.1, 0.9, 72

I尝试使用:

j<-unlist(input)
output<-matrix(j,nrow=nrow(input),ncol=length(j)/nrow(input))

但它弄乱了订单

任何想法?

其他信息:

上面的例子略微简化, dput(h ead(input))返回以下示例:

The above example is slightly simplified and dput(head(input)) returns the following sample:

structure(list(DATE = structure(c(2924, 2925, 2926, 2927, 2928, 
2929), class = "Date"), TEMP_MEAN_M0 = c(-7.625, -7.375, -6, 
-5.5, -7.625, -9.625), SLP_MEAN_M0 = c(1012.125, 991.975, 989.825, 
986.675, 988.95, 993.075), WIND_DIR_RF_M0 = structure(list(`2.counts` = c(0, 
0.625, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.125, 0, 0, 0, 0.125), `3.counts` = c(0.75, 
0.25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `4.counts` = c(0.375, 
0.125, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.125, 0.125, 0, 0, 0), `5.counts` = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.125, 
0, 0, 0.125, 0.375, 0.25, 0, 0, 0, 0, 0, 0, 0, 0, 0), `6.counts` = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.125, 
0, 0.25, 0.125, 0.25, 0.25, 0, 0, 0, 0, 0, 0, 0, 0, 0), `7.counts` = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0.125, 0.5, 0.375, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("2.counts", 
"3.counts", "4.counts", "5.counts", "6.counts", "7.counts")), 
    CEIL_HGT_RF_M0 = structure(list(`2.counts` = c(0.625, 0, 
    0, 0, 0, 0, 0, 0, 0, 0.375), `3.counts` = c(0.75, 0.125, 
    0, 0.125, 0, 0, 0, 0, 0, 0), `4.counts` = c(0.25, 0.125, 
    0, 0.125, 0, 0, 0, 0, 0.25, 0.25), `5.counts` = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0.125, 0.875), `6.counts` = c(0, 0, 0, 0, 
    0, 0, 0, 0, 0, 1), `7.counts` = c(0, 0, 0, 0, 0, 0, 0, 0, 
    0, 1)), .Names = c("2.counts", "3.counts", "4.counts", "5.counts", 
    "6.counts", "7.counts")), WIND_SPD_MEAN_M0 = c(12.8125, 18.7375, 
    6.175, 8.175, 10.5375, 16.5375)), .Names = c("DATE", "TEMP_MEAN_M0", 
"SLP_MEAN_M0", "WIND_DIR_RF_M0", "CEIL_HGT_RF_M0", "WIND_SPD_MEAN_M0"
), row.names = c(NA, 6L), class = "data.frame")


推荐答案

这有点凌乱,可能效率很低,但应该帮助你开始:

This is somewhat messy and probably pretty inefficient, but should help get you started:

以下是一些示例数据:

mydf <- data.frame(Date = as.Date(c("1978-01-01", "1978-01-02")),
                   V1 = c(10, 10),
                   V2 = c(11, 11))
mydf$V3 <- list(c(1:10),
                c(11:20))
mydf$V4 <- list(c(21:25),
                c(26:30))
mydf
#         Date V1 V2                                     V3                 V4
# 1 1978-01-01 10 11          1, 2, 3, 4, 5, 6, 7, 8, 9, 10 21, 22, 23, 24, 25
# 2 1978-01-02 10 11 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 26, 27, 28, 29, 30

另外,一个小功能检查哪些列是列表,对于那些列, rbind 他们在一起,最终 cbind 他们与列的列。

And, a little function that checks to see which columns are lists, and for those columns, rbinds them together, and ultimately cbinds them with the columns that are not lists.

myFun <- function(data) {
  temp1 <- sapply(data, is.list)
  temp2 <- do.call(
    cbind, lapply(data[temp1], function(x) 
      data.frame(do.call(rbind, x), check.names=FALSE)))
  cbind(data[!temp1], temp2)
}

myFun(mydf)
#         Date V1 V2 V3.1 V3.2 V3.3 V3.4 V3.5 V3.6 V3.7 V3.8 V3.9 V3.10 V4.1
# 1 1978-01-01 10 11    1    2    3    4    5    6    7    8    9    10   21
# 2 1978-01-02 10 11   11   12   13   14   15   16   17   18   19    20   26
#   V4.2 V4.3 V4.4 V4.5
# 1   22   23   24   25
# 2   27   28   29   30

只有每个列列表都包含相同长度的向量(否则为R $ rbind 将不起作用。

This will only work if each "column" list contain vectors of the same length (otherwise base R's rbind will not work).

回顾这个问题一半一天之后,我看到另外一个用户( @ user1981275 )发布了一个更多的解决方案直截了当,但后来删除了他们的答案。也许他们删除,因为他们的方法将日期转换为整数,因为DWin指出,矩阵中的项必须是相同的模式。

Revisiting this question half a day later, I see that another user (@user1981275) posted a solution that is more straightforward, but then deleted their answer. Perhaps they deleted because their method converted the dates to integers since, as DWin pointed out, items in matrices must be all the same mode.

这是他们的解决方案: p>

Here was their solution:

t(apply(mydf, 1, unlist))
#      Date V1 V2 V31 V32 V33 V34 V35 V36 V37 V38 V39 V310 V41 V42 V43 V44 V45
# [1,] 2922 10 11   1   2   3   4   5   6   7   8   9   10  21  22  23  24  25
# [2,] 2923 10 11  11  12  13  14  15  16  17  18  19   20  26  27  28  29  30

以下是如何轻松修改以获得所需的输出。这肯定会比以前的方法更快:

Here's how it can easily be modified to get the desired output. This will definitely be faster than the earlier approach:

cbind(mydf[!sapply(mydf, is.list)], 
      (t(apply(mydf[sapply(mydf, is.list)], 1, unlist))))
#         Date V1 V2 V31 V32 V33 V34 V35 V36 V37 V38 V39 V310 V41 V42 V43 V44 V45
# 1 1978-01-01 10 11   1   2   3   4   5   6   7   8   9   10  21  22  23  24  25
# 2 1978-01-02 10 11  11  12  13  14  15  16  17  18  19   20  26  27  28  29  30

或者,作为用户功能:

myFun <- function(data) {
  ListCols <- sapply(data, is.list)
  cbind(data[!ListCols], t(apply(data[ListCols], 1, unlist)))
}
myFun(mydf)






更新2



我还写了一个更有效的函数叫code> col_flatten 这是我的SOfun软件包的一部分。


Update 2

I've also written a more efficient function called col_flatten that's part of my "SOfun" package.

使用以下方式安装软件包:

Install the package using:

source("http://news.mrdwab.com/install_github.R")
install_github("mrdwab/SOfun")

然后,你可以做:

library(SOfun)
col_flatten(mydf, names(which(sapply(mydf, is.list))), drop = TRUE)
##          Date V1 V2 V3_1 V3_2 V3_3 V3_4 V3_5 V3_6 V3_7 V3_8 V3_9 V3_10 V4_1 V4_2 V4_3 V4_4 V4_5
## 1: 1978-01-01 10 11    1    2    3    4    5    6    7    8    9    10   21   22   23   24   25
## 2: 1978-01-02 10 11   11   12   13   14   15   16   17   18   19    20   26   27   28   29   30

它基于data.table中的转置函数,所以请确保你还有data.table。

It's based on the transpose function in "data.table", so be sure you have "data.table" as well.

这篇关于取消列出数据帧中的所有列表元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆