如何使用动态名称计算R数据框中的多个新列 [英] How to compute multiple new columns in a R dataframe with dynamic names

查看:81
本文介绍了如何使用动态名称计算R数据框中的多个新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在R数据框中生成多个新列/变量,这些新列/变量具有取自矢量的动态新名称。新变量是根据单个列的组/级别计算的。
数据框包含沿深度( z )的不同化学元素( element )的量度( counts )。通过将某个深度的每个元素的计数除以相同深度的代理元素(代理)的相应计数来计算新变量。

I'm trying to generate multiple new columns/variables in a R dataframe with dynamic new names taken from a vector. The new variables are computed from groups/levels of a single column. The dataframe contains measurements (counts) of different chemical elements (element) along depth (z). The new variables are computed by diving the counts of each element at a certain depth by the respective counts of proxy elements (proxies) at the same depth.

已经有一种使用mutate的解决方案,如果我只想创建一个新列/显式命名列(请参见下面的代码),则该方法有效。我正在寻找一种通用的解决方案,用于在光泽的Web应用程序中,其中代理不是字符串,而是字符串的向量,并且会根据用户输入动态变化。

There is already a solution using mutate that works if I only want to create one new column/name the columns explicitly (see code below). I'm looking for a generalised solution to use in a shiny web app where proxies is not a string but a vector of strings and is dynamically changing according to user input.

# Working code for just one new column at a time (here Ti_ratio)

proxies <- "Ti"
df <- tibble(z = rep(1:10, 4), element = rep(c("Ag", "Fe", "Ca", "Ti"), each = 10), counts = rnorm(40))

df_Ti <- df %>%
  group_by(z) %>%
  mutate(Ti_ratio = counts/counts[element %in% proxies])



# Not working code for multiple columns at a time

proxies <- c("Ca", "Fe", "Ti")
varname <- paste(proxies, "ratio", sep = "_")

df_ratios <- df %>%
  group_by(z) %>%
  map(~ mutate(!!varname = .x$counts/.x$counts[element %in% proxies]))

输出工作代码:

> head(df_Ti)
# A tibble: 6 x 4
# Groups:   z [6]
      z element counts Ti_ratio
  <int> <chr>    <dbl>    <dbl>
1     1 Ag       2.41     4.10 
2     2 Ag      -1.06    -0.970
3     3 Ag      -0.312   -0.458
4     4 Ag      -0.186    0.570
5     5 Ag       1.12    -1.38 
6     6 Ag      -1.68    -2.84

无效代码的预期输出:

> head(df_ratios)
# A tibble: 6 x 6
# Groups:   z [6]
      z element counts Ca_ratio Fe_ratio Ti_ratio
  <int> <chr>    <dbl>    <dbl>    <dbl>    <dbl>
1     1 Ag       2.41     4.78   -10.1      4.10 
2     2 Ag      -1.06     3.19     0.506   -0.970
3     3 Ag      -0.312   -0.479   -0.621   -0.458
4     4 Ag      -0.186   -0.296   -0.145    0.570
5     5 Ag       1.12     0.353    3.19    -1.38 
6     6 Ag      -1.68    -2.81    -0.927   -2.84 






编辑:
我找到了 base问题的一般解决方案R 使用两个嵌套的for循环,类似于@fra发布的答案(不同之处在于,这里我遍历深度和代理):


I found a general solution to my problem with base R using two nested for-loops, similar to the answer posted by @fra (the difference being that here I loop both over the depth and the proxies):

library(tidyverse)
df <- tibble(z = rep(1:3, 4), element = rep(c("Ag", "Ca", "Fe", "Ti"), each = 3), counts = runif(12)) %>% arrange(z, element)
proxies <- c("Ca", "Fe", "Ti")

for (f in seq_along(proxies)) {
  proxy <- proxies[f]
  tmp2 <- NULL
  for (i in unique(df$z)) {
    tmp <- df[df$z == i,]
    tmp <- as.data.frame(tmp$counts/tmp$counts[tmp$element %in% proxy])
    names(tmp) <- paste(proxy, "ratio", sep = "_")
    tmp2 <- rbind(tmp2, tmp)
  }
  df[, 3 + f] <- tmp2
}

以及正确的输出:

> head(df)
# A tibble: 6 x 6
      z element counts Ca_ratio Fe_ratio Ti_ratio
  <int> <chr>    <dbl>    <dbl>    <dbl>    <dbl>
1     1 Ag      0.690    0.864      9.21    1.13 
2     1 Ca      0.798    1         10.7     1.30 
3     1 Fe      0.0749   0.0938     1       0.122
4     1 Ti      0.612    0.767      8.17    1    
5     2 Ag      0.687    0.807      3.76    0.730
6     2 Ca      0.851    1          4.66    0.904

我使数据框包含较少的数据,因此可以清楚地看出为什么该解决方案正确(元素本身的比率= 1)。
我仍​​然对可以用于管道的更优雅的解决方案感兴趣。

I made the dataframe contain less data so that it's clearly visible why this solution is correct (Ratios of elements with themselves = 1). I'm still interested in a more elegant solution that I could use with pipes.

推荐答案

A tidyverse 选项可能是创建一个类似于原始代码的函数,然后通过使用 map_dfc 来创建新列。

A tidyverse option could be to create a function, similar to your original code and then pass that through using map_dfc to create new columns.

library(tidyverse)

proxies <- c("Ca", "Fe", "Ti")

your_func <- function(x){

    df %>% 
       group_by(z) %>%
       mutate(!!paste(x, "ratio", sep = "_") := counts/counts[element %in% !!x]) %>% 
       ungroup() %>%
       select(!!paste(x, "ratio", sep = "_") )
}

df %>% 
   group_modify(~map_dfc(proxies, your_func)) %>% 
   bind_cols(df, .) %>%
   arrange(z, element)


#       z element  counts Ca_ratio Fe_ratio Ti_ratio
#   <int> <chr>     <dbl>    <dbl>    <dbl>    <dbl>
# 1     1 Ag      -0.112   -0.733    -0.197   -1.51 
# 2     1 Ca       0.153    1         0.269    2.06 
# 3     1 Fe       0.570    3.72      1        7.66 
# 4     1 Ti       0.0743   0.485     0.130    1    
# 5     2 Ag       0.881    0.406    -6.52    -1.49 
# 6     2 Ca       2.17     1       -16.1     -3.69 
# 7     2 Fe      -0.135   -0.0622    1        0.229
# 8     2 Ti      -0.590   -0.271     4.37     1    
# 9     3 Ag       0.398    0.837     0.166   -0.700
#10     3 Ca       0.476    1         0.198   -0.836
# ... with 30 more rows

这篇关于如何使用动态名称计算R数据框中的多个新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆