将列添加到 df 中,该列是使用不同列值组合为向量输入的函数的输出 [英] Add column to df that's the output of a function that uses different column values combined to be a vector input

查看:61
本文介绍了将列添加到 df 中,该列是使用不同列值组合为向量输入的函数的输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我实际问题的一个非常简化的版本.

This is a very simplified version of my actual problem.

我真正的 df 有很多列,我需要使用 select 从列名的字符向量中执行此操作.

My real df has many columns and I need to perform this action using a select from a character vector of column names.

library(tidyverse)


df <- data.frame(a1 = c(1:5), 
             b1 = c(3,1,3,4,6), 
             c1 = c(10:14), 
             a2 = c(9:13), 
             b2 = c(3:7), 
             c2 = c(15:19))
df
  a1 b1 c1 a2 b2 c2
1  1  3 10  9  3 15
2  2  1 11 10  4 16
3  3  3 12 11  5 17
4  4  4 13 12  6 18
5  5  6 14 13  7 19

假设我想使用 mutate 为所选列的每一行获取 cor - 我试过:

Let's say I wanted to get the cor for each row for selected columns using mutate - I tried:

df %>% 
  mutate(my_cor = cor(x = c(a1,b1,c2), y = c(a2,b2,c2)))

但这不起作用,因为它为每个列标题输入使用完整的数据列.

but this doesn't work as it uses the full column of data for each column header input.

上面输出dfmy_cor列的第一行应该是计算:

The first row of the my_cor column of the output df from above should be the calculation:

cor(x = c(1,3,10), y = c(9,3,15))

下一行应该是:

cor(x = c(2,1,11), y = c(10,4,16))

等等.我使用的实际函数更复杂,但它确实需要两个向量输入,就像 cor 那样,所以我认为这将是一个很好的代理.

and so on. The actual function I'm using is more complex but it does take two vector inputs like cor does so I figured this would be a good proxy.

我觉得我应该使用 purrr 来执行此操作 (类似于这篇文章) 但我还没有让它工作.

I have a feeling I should be using purrr for this action (similar to this post) but I haven't gotten it to work.

奖励:我面临的实际问题是使用的函数会使用许多不同的列,所以我希望能够从字符向量中select它们像 my_list_of_cols <- c("a1", "b1", "c1") (我的真实列表要长得多).

Bonus: The actual problem I'm facing is using a function that would use many different columns so I'd like to be able select them from a a character vector like my_list_of_cols <- c("a1", "b1", "c1") (my true list is much longer).

我怀疑我会使用 pmap_dbl 就像我链接到的帖子一样,但我无法让它工作 - 我尝试了类似的东西......

I suspect I'd be using pmap_dbl like the post I linked to but I can't get it to work - I tried something like...

mutate(my col = pmap_dbl(select(., var = my_list_of_cols), somefunction))

(请注意,上述部分中的 somefunction 接受 2 个向量输入,但其中一个是静态且预定义的 - 您可以假设向量 c(a2, b2, c2) 是静态的和预定义的,如:

(note that somefunction in the above portion takes a 2 vector inputs but one of them is static and pre-defined - you can assume the vector c(a2, b2, c2) is the static and predefined one like:

somefunction <- function(a1,b1,c1){
    a2 = 1 
    b2 = 4
    c2 = 5
    my_vec = c(a2, b2, c2)
         cor(x = (a1,b1,c1), y = my_vec)
}

)

我仍在学习如何使用 purrr 所以任何帮助将不胜感激!

I'm still learning how to use purrr so any help would be greatly appreciated!

推荐答案

这里有一个选项可以将列名和其他名称的对象传递给 select

Here is one option to pass an object of column names and other names passed into select

library(tidyverse)
my_list_of_cols <- c("a1", "b1", "c1")
another_list_cols <- c("a2", "b2", "c2")

df %>% 
  mutate(my_cor = pmap_dbl(
    select(., my_list_of_cols,
           another_list_cols), ~ c(...) %>% 
      {cor(.[my_list_of_cols], .[setdiff(names(.), my_list_of_cols)])}
    ))

这篇关于将列添加到 df 中,该列是使用不同列值组合为向量输入的函数的输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆