如何处理来自外部向量的不同值的数据框列(使用dplyr) [英] How can I manipulate dataframe columns with different values from an external vector (with dplyr)

查看:122
本文介绍了如何处理来自外部向量的不同值的数据框列(使用dplyr)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在R中,我想使用存储在向量中的适当命名值(或数据框,如果更容易)来操纵(说乘)数据框。列。

假设我想要首先总结变量 disp hp wt mtcars 数据集。

  vars<  -  c(disp,hp,wt)
mtcars%>%
summarise_at(vars,funs sum(。))

(throw a group_by(cyl),或者使用 mutate_at ,如果你想要更多的行)



现在我想将每个结果列乘以一个特定的值,由

 乘法器<  -  c(disp = 2,hp= 3,wt= 4)

是否可以引用在 summarise_at 函数中的这些?



结果应该是这样(我不想有指t他直接在那里变量名称):

  disp hp wt 
14766.2 14082 411.808

更新:



也许我的MWE太小了。假设我想用一个数据框执行相同的操作,按照 cyl

  mtcars%>%
group_by(cyl)%>%
summarise_at(vars,sum)

因此应该是:

  cyl disp hp wt 
1 4 2313.0 2727 100.572
2 6 2566.4 2568 87.280
3 8 9886.8 8787 223.956

更新2:



也许我在这里还不够明确,但是数据框中的列应该乘以向量中相应的值(只有那些列在矢量中提到),所以例如 disp 应该乘以2, hp 3和 wt 4,所有其他变量(例如 cyl )应保持不变的乘法。

解决方案

我们也可以通过 map 函数从 purrr

  library(purrr)
mtcars%>%
summarise_at(vars,sum)%>%
map2_df ,`*`)
#disp hp wt
#< dbl> < DBL> < DBL>
#1 14766.2 14082 411.808






更新的问题

  d1<  -  mtcars%>%
group_by(cyl)%>%
summarise_at(vars,sum)
d1%>%
select(one_of(vars))%>%
map2_df(multiplier [vars],〜.x * .y) %>%
bind_cols(d1%>%select(-one_of(vars)))
#cyl disp hp wt
#< dbl> < DBL> < DBL> < DBL>
#1 4 2313.0 2727 100.572
#2 6 2566.4 2568 87.280
#3 8 9886.8 8787 223.956






或者我们可以使用收集/传播

  library(tidyr)
mtcars%>%
group_by(cyl)%>%
summarise_at(vars,sum)% >%
gather(var,val,-cyl)%>%
mutate(val = val * multiplier [match(var,names(multiplier))])%>%
spread(var,val)
#cyl disp hp wt
#< dbl> < DBL> < DBL> < DBL>
#1 4 2313.0 2727 100.572
#2 6 2566.4 2568 87.280
#3 8 9886.8 8787 223.956


In R, I would like to manipulate (say multiply) data.frame columns with appropriately named values stored in a vector (or data.frame, if that's easier).

Let's say, I want to first summarise the variables disp, hp, and wt from the mtcars dataset.

vars <- c("disp", "hp", "wt")
mtcars %>% 
  summarise_at(vars, funs(sum(.))

(throw a group_by(cyl) into the mix, or use mutate_at if you'd like to have more rows)

Now I'd like to multiply each of the resulting columns with a particular value, given by

multiplier <- c("disp" = 2, "hp" = 3, "wt" = 4)

Is it possible to refer to these within the summarise_at function?

The result should look like this (and I don't want to have to refer to the variable names directly while getting there):

disp    hp    wt
14766.2 14082 411.808

UPDATE:

Maybe my MWE was too minimal. Let's say I want to do the same operation with a data.frame grouped by cyl

mtcars %>% 
  group_by(cyl) %>% 
  summarise_at(vars, sum) 

The result should thus be:

    cyl   disp   hp      wt
1     4 2313.0 2727 100.572
2     6 2566.4 2568  87.280
3     8 9886.8 8787 223.956

UPDATE 2:

Maybe I was not explicit enough here either, but the columns in the data.frame should be multiplied by the respective values in the vector (and only those columns mentioned in the vector), so e.g. disp should be multiplied by 2, hp by 3 and wt by 4, all other variables (e.g. cyl) should remain untouched by the multiplication.

解决方案

We could also do this with map function from purrr

library(purrr)
mtcars %>%
    summarise_at(vars, sum) %>%
    map2_df(multiplier, `*`)
#      disp    hp      wt
#     <dbl> <dbl>   <dbl>
# 1 14766.2 14082 411.808


For the updated question

d1 <- mtcars %>% 
         group_by(cyl) %>% 
         summarise_at(vars, sum) 
d1 %>% 
   select(one_of(vars)) %>% 
   map2_df(multiplier[vars], ~ .x * .y) %>%
   bind_cols(d1 %>% select(-one_of(vars)), .) 
#    cyl   disp    hp      wt
#  <dbl>  <dbl> <dbl>   <dbl>
#1     4 2313.0  2727 100.572
#2     6 2566.4  2568  87.280
#3     8 9886.8  8787 223.956


Or we can use gather/spread

library(tidyr)
mtcars %>% 
    group_by(cyl) %>% 
    summarise_at(vars, sum) %>% 
    gather(var, val, -cyl) %>% 
    mutate(val = val*multiplier[match(var, names(multiplier))]) %>% 
    spread(var, val)
#     cyl   disp    hp      wt
#   <dbl>  <dbl> <dbl>   <dbl>
#1     4 2313.0  2727 100.572
#2     6 2566.4  2568  87.280
#3     8 9886.8  8787 223.956

这篇关于如何处理来自外部向量的不同值的数据框列(使用dplyr)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆