使用 .id 生成带有 purrr::map_df 的输入项列,而不复制命名向量的输入 [英] Make column of input items with purrr::map_df using .id without duplicating inputs for named vector

查看:42
本文介绍了使用 .id 生成带有 purrr::map_df 的输入项列,而不复制命名向量的输入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我经常想映射数据框中的列名向量,并使用 .id 参数跟踪输出.但是要将与每个 map 迭代相关的列名称写入该 .id 列似乎需要将它们在输入向量中的名称加倍 - 换句话说,通过命名每一列用自己的名字命名.如果我没有用自己的名字命名列,那么 .id 只存储迭代的索引.

根据 purrr::map 文档,这是预期行为:><块引用>

.id
字符串或NULL.如果是字符串,则输出将包含具有该名称的变量,存储输入的名称(如果 .x 已命名)或索引(如果 .x 未命名).

但我的方法感觉有点笨拙,所以我想我错过了一些东西.有没有更好的方法来获取我正在迭代的列的列表,不需要在输入向量中写入每个列名两次?任何建议将不胜感激!

这里有一个例子:

库(rlang)图书馆(tidyverse)tb <- tibble(foo = rnorm(10), bar = rnorm(10))cols_once <- c("foo", "bar")cols_once %>% map_dfr(~ tb %>% summarise(avg = mean(!!sym(.x))), .id="var")# 小块:2 x 2var avg <-- var 只存储迭代索引<chr><dbl>1 1 -0.05192 2 0.204cols_twice <- c("foo" = "foo", "bar" = "bar")cols_twice %>% map_dfr(~ tb %>% summarise(avg = mean(!!sym(.x))), .id="var")# 小块:2 x 2var avg <-- var 存储列名<chr><dbl>1 富 -0.05192 巴 0.204

解决方案

您可以通过以下方式轻松创建输入向量:

setNames(names(tb), names(tb))

所以你的代码是:

setNames(names(tb), names(tb)) %>%map_dfr(~ tb %>% summarise(avg = mean(!!sym(.x))), .id="var")

<小时>

根据您的评论进行

仍然不是您希望的解决方案,但是当您不使用所有列名时,您仍然可以使用 setNames() 并子集您想要的那些(或子集你没有).

tb <- tibble(foo = rnorm(10), bar = rnorm(10), taz = rnorm(10))设置名称(名称(tb),名称(tb))[-3]

I often want to map over a vector of column names in a data frame, and keep track of the output using the .id argument. But to write the column names related to each map iteration into that .id column seems to require doubling up their name in the input vector - in other words, by naming each column name with its own name. If I don't name the column with its own name, then .id just stores the index of the iteration.

This is expected behavior, per the purrr::map docs:

.id
Either a string or NULL. If a string, the output will contain a variable with that name, storing either the name (if .x is named) or the index (if .x is unnamed) of the input.

But my approach feels a little clunky, so I imagine I'm missing something. Is there a better way to get a list of the columns I'm iterating over, that doesn't require writing each column name twice in the input vector? Any suggestions would be much appreciated!

Here's an example to work with:

library(rlang)
library(tidyverse)

tb <- tibble(foo = rnorm(10), bar = rnorm(10))

cols_once <- c("foo", "bar")
cols_once %>% map_dfr(~ tb %>% summarise(avg = mean(!!sym(.x))), .id="var")
# A tibble: 2 x 2
  var       avg   <-- var stores only the iteration index
  <chr>   <dbl>
1 1     -0.0519
2 2      0.204 

cols_twice <- c("foo" = "foo", "bar" = "bar")
cols_twice %>% map_dfr(~ tb %>% summarise(avg = mean(!!sym(.x))), .id="var")
# A tibble: 2 x 2
  var       avg   <-- var stores the column names
  <chr>   <dbl>
1 foo   -0.0519
2 bar    0.204 

解决方案

You could create your input vector easily with:

setNames(names(tb), names(tb))

So your code would be:

setNames(names(tb), names(tb)) %>%
  map_dfr(~ tb %>% summarise(avg = mean(!!sym(.x))), .id="var")


Edit following your comment:

Still not the solution you are hoping for, but when you don't use all the column names, you could still use setNames() and subset the ones you want (or subset out the ones you don't).

tb <- tibble(foo = rnorm(10), bar = rnorm(10), taz = rnorm(10))

setNames(names(tb), names(tb))[-3]

这篇关于使用 .id 生成带有 purrr::map_df 的输入项列,而不复制命名向量的输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆