在dplyr的cross()中引用列名 [英] Refering to column names inside dplyr's across()

查看:74
本文介绍了在dplyr的cross()中引用列名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以在 across()中的lambda函数中引用列名称?

Is it possible to refer to column names in a lambda function inside across()?

df <- tibble(age = c(12, 45), sex = c('f', 'f'))
allowed_values <- list(age = 18:100, sex = c("f", "m"))

df %>%
  mutate(across(c(age, sex),
                c(valid = ~ .x %in% allowed_values[[COLNAME]])))

我刚遇到此问题,OP在其中询问关于根据允许值列表验证数据框中的列.

I just came across this question where OP asks about validating columns in a dataframe based on a list of allowed values.

dplyr 刚刚获得了 across(),这似乎是很自然的选择,但我们需要使用列名称来查找允许的值.

dplyr just gained across() and it seems like a natural choice, but we need columns names to look up the allowed values.

我能想到的最好的方法是对 imap_dfr 的调用,但这是将结果集成到原始数据帧中会更加麻烦.将其集成到分析管道中比较麻烦.

The best I could come up with was a call to imap_dfr, but it is more cumbersome to integrate into an anlysis pipeline, because the results need to be re-combined with the original dataframe.

推荐答案

我认为您此时可能对 across 的需求过多(但这可能会刺激其他开发,所以也许有一天会将按照您的建议进行操作.)

I think that you may be asking too much of across at this point (but this may spur additional development, so maybe someday it will work the way you suggest).

我认为purrr包中的 imap 函数可能会为您提供所需的信息:

I think that the imap functions from the purrr package may give you what you want at this point:

> df <- tibble(age = c(12, 45), sex = c('f', 'f'))
> allowed_values <- list(age = 18:100, sex = c("f", "m"))
> 
> df %>% imap( ~ .x %in% allowed_values[[.y]])
$age
[1] FALSE  TRUE

$sex
[1] TRUE TRUE

> df %>% imap_dfc( ~ .x %in% allowed_values[[.y]])
# A tibble: 2 x 2
  age   sex  
  <lgl> <lgl>
1 FALSE TRUE 
2 TRUE  TRUE 

如果您想要具有合并有效性的单列,则可以通过 reduce 传递结果:

If you want a single column with the combined validity then you can pass the result through reduce:

> df %>% imap( ~ .x %in% allowed_values[[.y]]) %>%
+   reduce(`&`)
[1] FALSE  TRUE

然后可以将其作为新列添加到原始数据,或仅用于子集数据.我对tidyverse不够专业,还不知道是否可以将其与 mutate 组合以直接添加列.

This could then be added as a new column to the original data, or just used for subsetting the data. I am not expert enough with the tidyverse yet to know if this could be combined with mutate to add the columns directly.

这篇关于在dplyr的cross()中引用列名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆