结合选择和变异 [英] Combine select and mutate
问题描述
我经常在dplyr中手动组合select()和mutate()函数.这通常是因为我正在整理数据框,想基于旧列创建新列,而只想保留新列.
Quite often, I find myself manually combining select() and mutate() functions within dplyr. This is usually because I'm tidying up a dataframe, want to create new columns based on the old columns, and only want keep the new columns.
例如,如果我有关于高度和宽度的数据,但只想使用它们来计算和保留面积,那么我将使用:
For example, if I had data about heights and widths but only wanted to use them to calculate and keep the area then I would use:
library(dplyr)
df <- data.frame(height = 1:3, width = 10:12)
df %>%
mutate(area = height * width) %>%
select(area)
当在mutate步骤中创建许多变量时,很难确保它们都在select步骤中.是否有一种更优雅的方法来仅保留在mutate步骤中定义的变量?
When there are a lot of variables being created in the mutate step it can be difficult to make sure they're all in the select step. Is there a more elegant way to only keep the variables defined in the mutate step?
我一直在使用的一种解决方法是:
One workaround I've been using is the following:
df %>%
mutate(id = row_number()) %>%
group_by(id) %>%
summarise(area = height * width) %>%
ungroup() %>%
select(-id)
这有效,但是非常冗长,使用summarise()意味着会带来性能上的损失:
This works but is pretty verbose, and the use of summarise() means there's a performance hit:
library(microbenchmark)
microbenchmark(
df %>%
mutate(area = height * width) %>%
select(area),
df %>%
mutate(id = row_number()) %>%
group_by(id) %>%
summarise(area = height * width) %>%
ungroup() %>%
select(-id)
)
输出:
min lq mean median uq max neval cld
868.822 954.053 1258.328 1147.050 1363.251 4369.544 100 a
1897.396 1958.754 2319.545 2247.022 2549.124 4025.050 100 b
我在想还有另一种解决方法,您可以将原始数据框名称与新数据框名称进行比较,并采用正确的补码,但是也许有更好的方法吗?
I'm thinking there's another workaround where you can compare the original dataframe names with the new dataframe names and take the right complement, but maybe there's a better way?
我觉得我在dplyr文档中确实缺少一些明显的东西,如果这很简单,我们深表歉意!
I feel like I'm missing something really obvious in the dplyr documentation, so apologies if this is trivial!
推荐答案
只是让@Nate的注释更具可见性, transmute()
是必经之路!根据其描述:
Just to give more visibility to @Nate's comment, transmute()
is the way to go!! From its description:
mutate() adds new variables and preserves existing; transmute() drops existing variables.
给出一个可行的例子,
df %>%
transmute(area = height * width)
与
df %>%
mutate(area = height * width) %>%
select(area)
这篇关于结合选择和变异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!