将具有可变数量元素的列表的嵌套列表展平到数据框 [英] Flatten nested list of lists with variable numbers of elements to a data frame

查看：70 发布时间：2021/6/18 19:48:22 r list plyr geosphere

本文介绍了将具有可变数量元素的列表的嵌套列表展平到数据框的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个嵌套的列表列表，我想将这些列表拼凑成一个带有 id 变量的数据框，以便我知道每个列表元素(和子列表元素)来自哪个列表元素.

I've got a nested list of lists that I'd like to flatten into a dataframe with id variables so I know which list elements (and sub-list elements) each came from.

> str(gc_all)
List of 3
$ 1: num [1:102, 1:2] -74 -73.5 -73 -72.5 -71.9 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:2] "lon" "lat"
$ 2: num [1:102, 1:2] -74 -73.3 -72.5 -71.8 -71 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:2] "lon" "lat"
$ 3:List of 2
..$ : num [1:37, 1:2] -74 -74.4 -74.8 -75.3 -75.8 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:2] "lon" "lat"
..$ : num [1:65, 1:2] 180 169 163 158 154 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:2] "lon" "lat"

我以前使用 plyr::ldply(mylist, rbind) 来展平列表，但由于列表长度可变，我似乎遇到了问题:一些列表元素只包含一个数据框，而其他包含两个数据框的列表.

I've used plyr::ldply(mylist, rbind) for flattening lists before, but I seem to be encountering trouble due to variable list lengths: some list elements contain only one dataframe, whilst others contain a list of two dataframes.

我找到了一个笨拙的解决方案，使用两个 lapply 和一个 ifelse，如下所示:

I've found a clunky solution using two lapplys and an ifelse like so:

# sample latitude-longitude data
df <- data.frame(source_lat = rep(40.7128, 3),
                 source_lon = rep(-74.0059, 3),
                 dest_lat = c(55.7982, 41.0082, -7.2575),
                 dest_lon = c(37.968, 28.9784, 112.7521),
                 id = 1:3)

# split into list
gc_list <- split(df, df$id)

# get great circles between lat-lon for each id; multiple list elements are outputted when the great circle crosses the dateline
gc_all <- lapply(gc_list, function(x) {
  geosphere::gcIntermediate(x[, c("source_lon", "source_lat")],
                 x[, c("dest_lon", "dest_lat")],
                 n = 100, addStartEnd=TRUE, breakAtDateLine=TRUE)
})

gc_fortified <- lapply(1:length(gc_all), function(i) {
  if(class(gc_all[[i]]) == "list") {
    lapply(1:length(gc_all[[i]]), function(j) {
      data.frame(gc_all[[i]][[j]], id = i, section = j)
    }) %>%
      plyr::rbind.fill()
  } else {
    data.frame(gc_all[[i]], id = i, section = 1)
  }
}) %>%
  plyr::rbind.fill()

但我觉得必须有一个更优雅的解决方案，可以作为单线工作，例如dput、data.table?

But I feel like there must be a more elegant solution that works as a one-liner, e.g. dput, data.table?

这是我期望的输出:

> gc_fortified %>% 
    group_by(id, section) %>%
    slice(1)

lon      lat    id section
<dbl>    <dbl> <int>   <dbl>
1 -74.0059 40.71280     1       1
2 -74.0059 40.71280     2       1
3 -74.0059 40.71280     3       1
4 180.0000 79.70115     3       2

将具有可变数量元素的列表的嵌套列表展平到数据框 [英] Flatten nested list of lists with variable numbers of elements to a data frame

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

将具有可变数量元素的列表的嵌套列表展平到数据框 [英] Flatten nested list of lists with variable numbers of elements to a data frame

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭