在带有map()的嵌套数据框中使用filter()(和其他dplyr函数) [英] Use filter() (and other dplyr functions) inside nested data frames with map()

查看:96
本文介绍了在带有map()的嵌套数据框中使用filter()(和其他dplyr函数)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 purrr 包中的 map()来应用过滤器()函数处理存储在嵌套数据框中的数据。

I'm trying to use map() of purrr package to apply filter() function to the data stored in a nested data frame.

为什么不先过滤,然后嵌套?-您可能会问。
可以正常工作(我将使用这样的过程显示期望的结果),但是我正在寻找使用 purrr
我只想有一个数据框,有两个列表列,两个都是嵌套数据框-一个完整的和一个过滤的。

"Why wouldn't you filter first, and then nest? - you might ask. That will work (and I'll show my desired outcome using such process), but I'm looking for ways to do it with purrr. I want to have just one data frame, with two list-columns, both being nested data frames - one full and one filtered.

我现在可以通过执行 nest()两次来实现它:一次对所有数据,第二次对过滤数据:

I can achieve it now by performing nest() twice: once on all data, and second on filtered data:

library(tidyverse)

df <- tibble(
  a = sample(x = rep(c('x','y'),5), size = 10),
  b = sample(c(1:10)),
  c = sample(c(91:100))
)

df_full_nested <- df %>% 
  group_by(a) %>% 
  nest(.key = 'full')

df_filter_nested <- df %>%
  filter(c >= 95) %>%  ##this is the key step
  group_by(a) %>% 
  nest(.key = 'filtered')

## Desired outcome - one data frame with 2 nested list-columns: one full and one filtered.
## How to achieve this without breaking it out into 2 separate data frames?
df_nested <- df_full_nested %>% 
  left_join(df_filter_nested, by = 'a')

对象看起来像这样:

> df
# A tibble: 10 x 3
       a     b     c
   <chr> <int> <int>
 1     y     8    93
 2     x     9    94
 3     y    10    99
 4     x     5    97
 5     y     2   100
 6     y     3    95
 7     x     7    96
 8     y     6    92
 9     x     4    91
10     x     1    98

> df_full_nested
# A tibble: 2 x 2
      a             full
  <chr>           <list>
1     y <tibble [5 x 2]>
2     x <tibble [5 x 2]>

> df_filter_nested
# A tibble: 2 x 2
      a         filtered
  <chr>           <list>
1     y <tibble [3 x 2]>
2     x <tibble [3 x 2]>

> df_nested
# A tibble: 2 x 3
      a             full         filtered
  <chr>           <list>           <list>
1     y <tibble [5 x 2]> <tibble [4 x 2]>
2     x <tibble [5 x 2]> <tibble [4 x 2]>

因此,这可行。但这不干净。在现实生活中,我按几列进行分组,这意味着我也必须同时加入几列...毛茸茸的很快。

So, this works. But it is not clean. And in real life, I group by several columns, which means I also have to join on several columns... It gets hairy fast.

我想知道是否有是一种将过滤器应用于嵌套列的方法。这样,我将在同一对象内进行操作。只是更简洁,更易理解的代码。

I'm wondering if there is a way to apply filter to the nested column. This way, I'd operate within the same object. Just cleaner and more understandable code.

我认为它看起来像是

df_full_nested %>% mutate(filtered = map(full, ...))

但是我不确定如何正确映射 filter()

But I am not sure how to map filter() properly

谢谢!

推荐答案

您可以使用 map(full,〜filter(。,c> = 95)),其中代表单个嵌套的 tibble ,您可以直接对其应用 filter

You can use map(full, ~ filter(., c >= 95)), where . stands for individual nested tibble, to which you can apply the filter directly:

df_nested_2 <- df_full_nested %>% mutate(filtered = map(full, ~ filter(., c >= 95)))

identical(df_nested, df_nested_2)
# [1] TRUE

这篇关于在带有map()的嵌套数据框中使用filter()(和其他dplyr函数)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆