将具有未命名条目的列表列表转换为数据框或小标题 [英] Turn a list of lists with unnamed entries into a data frame or a tibble

查看:19
本文介绍了将具有未命名条目的列表列表转换为数据框或小标题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 RStudio 中的 reticulate R 包来运行一些 Python 代码以从 ROOT 中获取数据(http://root.cern.ch) 到 R.我的问题是 python 代码返回一个按行列表的列表.例如,在python中,

I'm using the reticulate R package from RStudio to run some python code to bring data from ROOT (http://root.cern.ch) into R. My problem is that the python code returns a list of row-wise lists. For example, in python,

<代码> [[0L,0L, '亩+',1,0,0,1,3231.6421853545253,-17.361063509909364,6322.884067996471,-2751.857298366544,1.2318766603937736,1407.9560948453036,3092.931322317615][0L,0L, 'nu_e',3,1,0,0,3231.6421853545253,-17.361063509909364,6322.884067996471,-743.6755000649275,9.950229845741603,342.4203222294634,818.781981693865][0L,0L, 'anti_nu_mu',2,1,0,0,3231.6421853545253,-17.361063509909364,6322.884067996471,-808.1114666690765,21.680955968349267,445.2784282520303,922.9231198102832]...]

这些数据通过reticulate,

List of 136972
$ :List of 14
..$ : int 0
..$ : int 0
..$ : chr "mu+"
..$ : int 1
..$ : int 0
..$ : int 0
..$ : int 0
..$ : num 7162
..$ : num -0.0108
..$ : num -627
..$ : num 264
..$ : num -3.24
..$ : num 3080
..$ : num 3093
$ :List of 14
..$ : int 0
..$ : int 0
..$ : chr "mu+"
..$ : int 1
.... (you get the idea)

我搜索了所有我能想到的地方,但找不到将这些数据转换为数据框的方法(我真的很想要一个小标题).一个问题似乎是列表条目没有命名.有很多数据,所以我不想做一些低效的事情.我可以让 python 代码返回一个列字典,这将起作用.但是生成一行的python代码要简单得多.

I've searched everywhere I can think of, and I cannot find a way to turn these data into a data frame (I really want a tibble). One problem seems to be that the list entries are not named. There's a lot of data, and so I don't want to do something inefficient. I can have the python code return a dictionary of columns and that will work. But the python code to make a row is so much simpler.

如果有一种简单的方法将这些逐行列表转换为数据框,那将是理想的.有什么想法吗?

If there was an easy way to turn these row-wise lists into a data frame, that would be ideal. Any ideas?

推荐答案

以下是我想到的几种方法:

Here are a couple of approaches that came to mind:

  • 选项 1:我们知道子列表中有多少项(预期有多少列).循环遍历列表以使用子列表中的每个相关元素创建一个新列表.把它包装在 as.data.frame 中,你就完成了.

myFun_1 <- function(inlist, expectedCols = 14) {
  as.data.frame(
    lapply(sequence(expectedCols), 
           function(x) {
             sapply(inlist, function(y) y[[x]])
            }),
    col.names = paste0("V", sequence(expectedCols)))
}

  • 选项 2. 使用 do.call(rbind, .) 然后 unlist 将每一列做成一个常规的 data.frame> 没有 list 列.

  • Option 2. Use do.call(rbind, .) and then unlist each column to make a regular data.frame with no list columns.

    myFun_2 <- function(inlist) {
      x <- as.data.frame(do.call(rbind, inlist))
      x[] <- lapply(x, unlist)
      x
    }
    

  • 让我们用一些示例数据来测试一下.这是一个 list,它应该创建一个矩形的 3 行 x 14 列数据集:

    Let's test these out with some sample data. Here's a list that should create a rectangular 3 row x 14 column dataset:

    LL <- list(
      list(0L, 0L, 'mu+', 1, 0, 0, 1, 3231.6421853545253, -17.361063509909364,
           6322.884067996471, -2751.857298366544, 1.2318766603937736, 
           1407.9560948453036, 3092.931322317615),
      list(0L, 0L, 'nu_e', 3, 1, 0, 0, 3231.6421853545253, -17.361063509909364,
           6322.884067996471, -743.6755000649275, 9.950229845741603, 
           342.4203222294634, 818.781981693865),
      list(0L, 0L, 'anti_nu_mu', 2, 1, 0, 0, 3231.6421853545253, 
           -17.361063509909364, 6322.884067996471, -808.1114666690765, 
           21.680955968349267, 445.2784282520303, 922.9231198102832))
    

    这是一个更大的版本,它将创建一个 150000 行 x 14 列的数据集.

    Here's a bigger version of this, which would create a 150000 row by 14 column dataset.

    Big_LL <- unlist(replicate(50000, LL, FALSE), FALSE)
    

    每个函数在小数据集上的结果:

    Outcomes of each function on the small dataset:

    myFun_1(LL)
    ##   V1 V2         V3 V4 V5 V6 V7       V8        V9      V10        V11       V12
    ## 1  0  0        mu+  1  0  0  1 3231.642 -17.36106 6322.884 -2751.8573  1.231877
    ## 2  0  0       nu_e  3  1  0  0 3231.642 -17.36106 6322.884  -743.6755  9.950230
    ## 3  0  0 anti_nu_mu  2  1  0  0 3231.642 -17.36106 6322.884  -808.1115 21.680956
    ##         V13       V14
    ## 1 1407.9561 3092.9313
    ## 2  342.4203  818.7820
    ## 3  445.2784  922.9231
    
    myFun_2(LL)
    ##   V1 V2         V3 V4 V5 V6 V7       V8        V9      V10        V11       V12
    ## 1  0  0        mu+  1  0  0  1 3231.642 -17.36106 6322.884 -2751.8573  1.231877
    ## 2  0  0       nu_e  3  1  0  0 3231.642 -17.36106 6322.884  -743.6755  9.950230
    ## 3  0  0 anti_nu_mu  2  1  0  0 3231.642 -17.36106 6322.884  -808.1115 21.680956
    ##         V13       V14
    ## 1 1407.9561 3092.9313
    ## 2  342.4203  818.7820
    ## 3  445.2784  922.9231
    

    一切看起来都不错.现在,性能如何?

    All looking good. Now, how about performance?

    system.time(myFun_1(Big_LL))
    ##    user  system elapsed 
    ##    2.65    0.05    2.75 
    
    system.time(myFun_2(Big_LL))
    ##    user  system elapsed 
    ##    0.41    0.00    0.40 
    

    <小时>

    所以,采用第二种方法;-)


    So, go with the second approach ;-)

    这篇关于将具有未命名条目的列表列表转换为数据框或小标题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆