使用dplyr突变数据框中的多列 [英] Mutating multiple columns in a data frame using dplyr

查看:60
本文介绍了使用dplyr突变数据框中的多列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据框 df

  v1 v2 v3 v4
1  1  5  7  4
2  2  6 10  3

我想获得以下数据框 df2 乘以列v1 * v3和v2 * v4:

And I want to obtain the following data frame df2 multiplying columns v1*v3 and v2*v4:

  v1 v2 v3 v4 v1v3 v2v4
1  1  5  7  4    7   20
2  2  6 10  3   20   18

如何使用 dplyr 做到这一点?使用 mutate_each

How can I do that using dplyr? Using mutate_each?

我需要一个可以归纳为大量变量的解决方案不仅是4(从v1到v4)。
这是生成示例的代码:

I need a solution that can be generalized to a large number of variables and not only 4 (v1 to v4). This is the code to generate the example:

v1 <- c(1, 2)
v2 <- c(5,6)
v3 <- c(7, 10)
v4 <- c(4, 3)
df <- data.frame(v1, v2, v3, v4)
v1v3 <- c(v1 * v3)
v2v4 <- c(v2 * v4)
df2 <- cbind(df, v1v3, v2v4)


推荐答案

您真的很亲近。

df2 <- 
    df %>% 
    mutate(v1v3 = v1 * v3,
           v2v4 = v2 * v4)

如此精美的语言,对吧?

such a beautifully simple language, right?

有关更多精彩技巧,请参见

For more great tricks please see here.

编辑:
感谢@Facottons指向此答案的指针:
https://stackoverflow.com/a/34377242/5088194 ,这是解决问题的一种 tidy 方法。它使您不必在所需的每个新列中写一行硬代码。尽管它比 Base R 方法更为冗长,但逻辑至少更加直接透明/可读。同样值得注意的是,要使用此方法,行数必须至少是列数的一半。

Thanks to @Facottons pointer to this answer: https://stackoverflow.com/a/34377242/5088194, here is a tidy approach to resolving this issue. It keeps one from having to write a line to hard code in each new column desired. While it is a bit more verbose than the Base R approach, the logic is at least more immediately transparent/readable. It is also worth noting that there must be at least half as many rows as there are columns for this approach to work.

# prep the product column names (also acting as row numbers)
df <- 
    df %>%
    mutate(prod_grp = paste0("v", row_number(), "v", row_number() + 2)) 

# converting data to tidy format and pairing columns to be multiplied together.
tidy_df <- 
    df %>%
    gather(column, value, -prod_grp) %>% 
    mutate(column = as.numeric(sub("v", "", column)),
           pair = column - 2) %>% 
    mutate(pair = if_else(pair < 1, pair + 2, pair))

# summarize the products for each column
prod_df <- 
    tidy_df %>% 
    group_by(prod_grp, pair) %>% 
    summarize(val = prod(value)) %>% 
    spread(prod_grp, val) %>% 
    mutate(pair = paste0("v", pair, "v", pair + 2)) %>% 
    rename(prod_grp = pair)

# put the original frame and summary frames together
final_df <- 
    df %>% 
    left_join(prod_df) %>% 
    select(-prod_grp)

这篇关于使用dplyr突变数据框中的多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆