在带有map()的嵌套数据框中使用filter()(和其他dplyr函数) [英] Use filter() (and other dplyr functions) inside nested data frames with map()
问题描述
我正在尝试使用 purrr
包中的 map()
来应用过滤器()
函数处理存储在嵌套数据框中的数据。
I'm trying to use map()
of purrr
package to apply filter()
function to the data stored in a nested data frame.
为什么不先过滤,然后嵌套?-您可能会问。
可以正常工作(我将使用这样的过程显示期望的结果),但是我正在寻找使用 purrr
。
我只想有一个数据框,有两个列表列,两个都是嵌套数据框-一个完整的和一个过滤的。
"Why wouldn't you filter first, and then nest? - you might ask.
That will work (and I'll show my desired outcome using such process), but I'm looking for ways to do it with purrr
.
I want to have just one data frame, with two list-columns, both being nested data frames - one full and one filtered.
我现在可以通过执行 nest()
两次来实现它:一次对所有数据,第二次对过滤数据:
I can achieve it now by performing nest()
twice: once on all data, and second on filtered data:
library(tidyverse)
df <- tibble(
a = sample(x = rep(c('x','y'),5), size = 10),
b = sample(c(1:10)),
c = sample(c(91:100))
)
df_full_nested <- df %>%
group_by(a) %>%
nest(.key = 'full')
df_filter_nested <- df %>%
filter(c >= 95) %>% ##this is the key step
group_by(a) %>%
nest(.key = 'filtered')
## Desired outcome - one data frame with 2 nested list-columns: one full and one filtered.
## How to achieve this without breaking it out into 2 separate data frames?
df_nested <- df_full_nested %>%
left_join(df_filter_nested, by = 'a')
对象看起来像这样:
> df
# A tibble: 10 x 3
a b c
<chr> <int> <int>
1 y 8 93
2 x 9 94
3 y 10 99
4 x 5 97
5 y 2 100
6 y 3 95
7 x 7 96
8 y 6 92
9 x 4 91
10 x 1 98
> df_full_nested
# A tibble: 2 x 2
a full
<chr> <list>
1 y <tibble [5 x 2]>
2 x <tibble [5 x 2]>
> df_filter_nested
# A tibble: 2 x 2
a filtered
<chr> <list>
1 y <tibble [3 x 2]>
2 x <tibble [3 x 2]>
> df_nested
# A tibble: 2 x 3
a full filtered
<chr> <list> <list>
1 y <tibble [5 x 2]> <tibble [4 x 2]>
2 x <tibble [5 x 2]> <tibble [4 x 2]>
因此,这可行。但这不干净。在现实生活中,我按几列进行分组,这意味着我也必须同时加入几列...毛茸茸的很快。
So, this works. But it is not clean. And in real life, I group by several columns, which means I also have to join on several columns... It gets hairy fast.
我想知道是否有是一种将过滤器应用于嵌套列的方法。这样,我将在同一对象内进行操作。只是更简洁,更易理解的代码。
I'm wondering if there is a way to apply filter to the nested column. This way, I'd operate within the same object. Just cleaner and more understandable code.
我认为它看起来像是
df_full_nested %>% mutate(filtered = map(full, ...))
但是我不确定如何正确映射 filter()
But I am not sure how to map filter()
properly
谢谢!
推荐答案
您可以使用 map(full,〜filter(。,c> = 95))
,其中。
代表单个嵌套的 tibble ,您可以直接对其应用 filter :
You can use map(full, ~ filter(., c >= 95))
, where .
stands for individual nested tibble, to which you can apply the filter directly:
df_nested_2 <- df_full_nested %>% mutate(filtered = map(full, ~ filter(., c >= 95)))
identical(df_nested, df_nested_2)
# [1] TRUE
这篇关于在带有map()的嵌套数据框中使用filter()(和其他dplyr函数)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!