展平具有复杂嵌套结构的列表 [英] Flatten a list with complex nested structure

查看:46
本文介绍了展平具有复杂嵌套结构的列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个具有以下示例结构的列表:

I have a list with the following example structure:

> dput(test)
structure(list(id = 1, var1 = 2, var3 = 4, section1 = structure(list(
    var1 = 1, var2 = 2, var3 = 3), .Names = c("var1", "var2", 
"var3")), section2 = structure(list(row = structure(list(var1 = 1, 
    var2 = 2, var3 = 3), .Names = c("var1", "var2", "var3")), 
    row = structure(list(var1 = 4, var2 = 5, var3 = 6), .Names = c("var1", 
    "var2", "var3")), row = structure(list(var1 = 7, var2 = 8, 
        var3 = 9), .Names = c("var1", "var2", "var3"))), .Names = c("row", 
"row", "row"))), .Names = c("id", "var1", "var3", "section1", 
"section2"))


> str(test)
List of 5
 $ id      : num 1
 $ var1    : num 2
 $ var3    : num 4
 $ section1:List of 3
  ..$ var1: num 1
  ..$ var2: num 2
  ..$ var3: num 3
 $ section2:List of 3
  ..$ row:List of 3
  .. ..$ var1: num 1
  .. ..$ var2: num 2
  .. ..$ var3: num 3
  ..$ row:List of 3
  .. ..$ var1: num 4
  .. ..$ var2: num 5
  .. ..$ var3: num 6
  ..$ row:List of 3
  .. ..$ var1: num 7
  .. ..$ var2: num 8
  .. ..$ var3: num 9

请注意, section2 列表包含名为 rows 的元素.这些代表多个记录.我所拥有的是一个嵌套列表,其中某些元素处于根级别,而其他元素是同一观察的多个嵌套记录.我想要 data.frame 格式的以下输出:

Notice that the section2 list contains elements named rows. These represent multiple records. What I have is a nested list where some elements are at the root level and others are multiple nested records for the same observation. I would like the following output in a data.frame format:

> desired
  id var1 var3 section1.var1 section1.var2 section1.var3 section2.var1 section2.var2 section2.var3
1  1    2    4             1             2               3             1             4             7
2 NA   NA   NA            NA            NA              NA             2             5             8
3 NA   NA   NA            NA            NA              NA             3             6             9

根级元素应填充第一行,而 row 元素应具有自己的行.更为复杂的是, row 条目中的变量数量可能会有所不同.

Root-level elements should populate the first row, while row elements should have their own rows. As an added complication, the number of variables in the row entries can vary.

推荐答案

这是一种通用方法.它不假设您只有三行;它将与您拥有的任何行一起工作.而且,如果嵌套结构中缺少值(例如,第2节中的某些子列表不存在var1),则代码会正确返回该单元格的NA.

Here's a general approach. It doesn't assume that you'll have only three row; it will work with however many rows you have. And if a value is missing in the nested structure (e.g. var1 doesn't exist for some sub-lists in section2), the code correctly returns an NA for that cell.

例如如果我们使用以下数据:

E.g. if we use the following data:

test <- structure(list(id = 1, var1 = 2, var3 = 4, section1 = structure(list(var1 = 1, var2 = 2, var3 = 3), .Names = c("var1", "var2", "var3")), section2 = structure(list(row = structure(list(var1 = 1, var2 = 2), .Names = c("var1", "var2")), row = structure(list(var1 = 4, var2 = 5), .Names = c("var1", "var2")), row = structure(list( var2 = 8, var3 = 9), .Names = c("var2", "var3"))), .Names = c("row", "row", "row"))), .Names = c("id", "var1", "var3", "section1", "section2"))

通常的方法是使用melt创建一个包含有关嵌套结构信息的数据框,然后进行广播以将其成型为所需的格式.

The general approach is to use melt to create a dataframe that includes information about the nested structure, and then dcast to mold it into the format you desire.

library("reshape2")

flat <- unlist(test, recursive=FALSE)
names(flat)[grep("row", names(flat))] <- gsub("row", "var", paste0(names(flat)[grep("row", names(flat))], seq_len(length(names(flat)[grep("row", names(flat))]))))  ## keeps track of rows by adding an ID
ul <- melt(unlist(flat))
split <- strsplit(rownames(ul), split=".", fixed=TRUE) ## splits the names into component parts
max <- max(unlist(lapply(split, FUN=length)))
pad <- function(a) {
  c(a, rep(NA, max-length(a)))
}
levels <- matrix(unlist(lapply(split, FUN=pad)), ncol=max, byrow=TRUE)

## Get the nesting structure
nested <- data.frame(levels, ul)
nested$X3[is.na(nested$X3)] <- levels(as.factor(nested$X3))[[1]]
desired <- dcast(nested, X3~X1 + X2)
names(desired) <- gsub("_", "\\.", gsub("_NA", "", names(desired)))
desired <- desired[,names(flat)]

> desired
  ## id var1 var3 section1.var1 section1.var2 section1.var3 section2.var1 section2.var2 section2.var3
## 1  1    2    4             1             2             3             1             4             7
## 2 NA   NA   NA            NA            NA            NA             2             5             8
## 3 NA   NA   NA            NA            NA            NA             3             6             9

这篇关于展平具有复杂嵌套结构的列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆