结合选择和变异 [英] Combine select and mutate

查看：60 发布时间：2021/5/2 20:44:14 r dplyr

本文介绍了结合选择和变异的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我经常在dplyr中手动组合select()和mutate()函数.这通常是因为我正在整理数据框，想基于旧列创建新列，而只想保留新列.

Quite often, I find myself manually combining select() and mutate() functions within dplyr. This is usually because I'm tidying up a dataframe, want to create new columns based on the old columns, and only want keep the new columns.

例如，如果我有关于高度和宽度的数据，但只想使用它们来计算和保留面积，那么我将使用:

For example, if I had data about heights and widths but only wanted to use them to calculate and keep the area then I would use:

library(dplyr)
df <- data.frame(height = 1:3, width = 10:12)

df %>% 
  mutate(area = height * width) %>% 
  select(area)

当在mutate步骤中创建许多变量时，很难确保它们都在select步骤中.是否有一种更优雅的方法来仅保留在mutate步骤中定义的变量?

When there are a lot of variables being created in the mutate step it can be difficult to make sure they're all in the select step. Is there a more elegant way to only keep the variables defined in the mutate step?

我一直在使用的一种解决方法是:

One workaround I've been using is the following:

df %>%
  mutate(id = row_number()) %>%
  group_by(id) %>%
  summarise(area = height * width) %>%
  ungroup() %>%
  select(-id)

这有效，但是非常冗长，使用summarise()意味着会带来性能上的损失:

This works but is pretty verbose, and the use of summarise() means there's a performance hit:

library(microbenchmark)

microbenchmark(

  df %>% 
    mutate(area = height * width) %>% 
    select(area),

  df %>%
    mutate(id = row_number()) %>%
    group_by(id) %>%
    summarise(area = height * width) %>%
    ungroup() %>%
    select(-id)
)

输出:

      min       lq     mean   median       uq      max neval cld
  868.822  954.053 1258.328 1147.050 1363.251 4369.544   100  a 
 1897.396 1958.754 2319.545 2247.022 2549.124 4025.050   100   b

我在想还有另一种解决方法，您可以将原始数据框名称与新数据框名称进行比较，并采用正确的补码，但是也许有更好的方法吗?

I'm thinking there's another workaround where you can compare the original dataframe names with the new dataframe names and take the right complement, but maybe there's a better way?

我觉得我在dplyr文档中确实缺少一些明显的东西，如果这很简单，我们深表歉意！

I feel like I'm missing something really obvious in the dplyr documentation, so apologies if this is trivial!

结合选择和变异 [英] Combine select and mutate

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

结合选择和变异 [英] Combine select and mutate

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭