将功能应用于带有地图的嵌套数据框 [英] Applying functions to nested dataframes with map

查看:88
本文介绍了将功能应用于带有地图的嵌套数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在嵌套和映射方面遇到问题,我不确定该如何解决.我对嵌套的数据帧有一个小标题,如下所示:

I am having an issue with nesting and mapping that I am not sure how to get around. I have a tibble with nested dataframes, as follows:

> x
# A tibble: 18 × 3
   event.no               data dr.dur
      <dbl>             <list>  <int>
1         1   <tibble [7 × 4]>      7
2         4 <tibble [123 × 4]>    123
3         5   <tibble [9 × 4]>      9
4         7  <tibble [14 × 4]>     14
5        10  <tibble [19 × 4]>     19
6        11 <tibble [220 × 4]>    220
7        12 <tibble [253 × 4]>    253
8        14 <tibble [153 × 4]>    153
9        15  <tibble [28 × 4]>     28
10       17 <tibble [169 × 4]>    169
11       18   <tibble [7 × 4]>      7
12       19 <tibble [115 × 4]>    115
13       21 <tibble [109 × 4]>    109
14       25  <tibble [13 × 4]>     13
15       26 <tibble [249 × 4]>    249
16       28   <tibble [7 × 4]>      7
17       30  <tibble [26 × 4]>     26
18       31  <tibble [12 × 4]>     12
>
> x$data[[1]]
# A tibble: 7 × 4
  discharge threshold def.increase event.orig
      <dbl>     <dbl>        <dbl>      <dbl>
1     0.348     0.373       2160.0          1
2     0.348     0.373       2160.0          1
3     0.379     0.373       -518.4          0
4     0.379     0.373       -518.4          0
5     0.379     0.373       -518.4          0
6     0.379     0.373       -518.4          0
7     0.348     0.373       2160.0          2
> 

我需要在每个嵌套数据框中找到def.increase列的总和.我现在不确定要执行此操作的最佳方法,这就是我一直在尝试的方法:

I need to find the sum of the def.increase column in each of the nested dataframes. I'm not sure of the best method to do this right now, this is what I've been trying:

> x %>%
+   mutate(dr.def = map(data, colSums)) %>%
+   unnest(dr.def)
# A tibble: 72 × 3
   event.no dr.dur    dr.def
      <dbl>  <int>     <dbl>
1         1      7     2.560
2         1      7     2.611
3         1      7  4406.400
4         1      7     4.000
5         4    123    45.739
6         4    123    45.879
7         4    123 12096.000
8         4    123   530.000
9         5      9     3.269
10        5      9     3.357
# ... with 62 more rows

显然,这样做的问题是我最终得到了每一列的总和.没关系,但是之后只选择我想要的行会变得很混乱.有没有更好的方法来查找我的每个def.increase列的列总和?感谢您的帮助:)

Obviously the issue with this is that I end up with the sum from every column. This would be okay but it gets quite messy afterwards to select only the rows that I want. Is there a better way of finding the column sum for each of my def.increase columns? Thanks for your help :)

不知道我是否可以复制/粘贴像我的x这样的对象,所以这是wetransfer上rds的链接(如果允许的话):

Not sure if I can copy/paste an object like my x so here is a link to the rds on wetransfer (if that's allowed): https://wetransfer.com/downloads/9697fff593f51c02136bc704adccbcc220170112161115/5be1fc

推荐答案

您只需要首先选择def.increase列:

library(tidyverse)

x %>% 
  mutate(dr.def = map(data, "def.increase") %>% map_dbl(sum))

或者仅一张地图:

x %>% 
  mutate(dr.def = map_dbl(data, ~ sum(.x[["def.increase"]])))

这篇关于将功能应用于带有地图的嵌套数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆