自定义函数,以使用starts_with()为行修改新列 [英] Custom function to mutate a new column for row means using starts_with()

查看:90
本文介绍了自定义函数,以使用starts_with()为行修改新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,我想为其创建行均值的列。应该为数据中的一组列计算每一行平均列。彼此相关。我可以使用dplyr的 starts_with()区分各组列。由于我有几组列来计算行均值,因此我想构建一个函数来实现此目的。出于某些原因,我无法使其正常工作。

I have a data frame for which I want to create columns for row means. Each row mean column should be computed for a group of columns in the data. which are related to each other. I can differentiate between the groups of columns using dplyr's starts_with(). Since I have several groups of columns to calculate row means for, I'd like to build a function to do it. For some reason, I fail to get it to work.

df <- data.frame("europe_paris" = 1:10, 
                 "europe_london" = 11:20, 
                 "europe_rome" = 21:30,
                 "asia_bangkok" = 31:40,
                 "asia_tokyo" = 41:50,
                 "asia_kathmandu" = 51:60)
set.seed(123)
df <- as.data.frame(lapply(df, function(cc) cc[ sample(c(TRUE, NA),
                                                 prob = c(0.70, 0.30),
                                                 size = length(cc), 
                                                 replace = TRUE) ]))

df

   europe_paris europe_london europe_rome asia_bangkok asia_tokyo asia_kathmandu
1             1            NA          NA           NA         41             51
2            NA            12          22           NA         42             52
3             3            13          23           33         43             NA
4            NA            14          NA           NA         44             54
5            NA            15          25           35         45             55
6             6            NA          NA           36         46             56
7             7            17          27           NA         47             57
8            NA            18          28           38         48             NA
9             9            19          29           39         49             NA
10           10            NA          30           40         NA             60

我想为城市之间每个大陆的行均值创建一个新列。一列针对亚洲城市,一列针对欧洲。该函数的每次运行都将以一个大洲的名称为基础,以指导选择哪个列。

I want to create a new column for the row means of each continent, across cities. One column for Asia cities, and one for Europe. Each run of the function will be fed by the name of a continent, to guide which columns to pick.

此尝试基于此答案

continent_mean <- 
  function(continent)  {
  df %>%
  select(starts_with(as.character(continent))) %>%
  mutate(., (!!as.name(continent)) == rowMeans(., na.rm = TRUE))
}

但是,运行此代码会导致行为异常,因为它似乎返回了相同的数据集,只是根据 starts_with()的选定列,但是它不会为行均值生成新列。

However, running this code results in a weird behavior, as it seemingly returns the same dataset, with just the selected columns according to starts_with(), but it doesn't generate a new column for row means.

continent_mean("asia")

   asia_bangkok asia_tokyo asia_kathmandu
1            31         41             51
2            32         42             52
3            33         43             53
4            34         44             54
5            35         45             55
6            36         46             56
7            37         47             57
8            38         48             58
9            39         49             59
10           40         50             60

什么是我在这里想念吗?我认为这可能是由于 mutate()<中的 == 而不是 = / code>,但是单个 = 会引发错误,因此似乎也不是解决方案。

What am I missing here? I thought this could be due to the == rather than = in mutate(), but a single = throws an error, so it seems not to be the solution either.

谢谢!

推荐答案

我们可以使用 quo_name 分配列名称

We can use quo_name to assign column names

library(dplyr)
library(rlang)

continent_mean <- function(df, continent)  {
    df %>%
      select(starts_with(continent)) %>%
      mutate(!!quo_name(continent) := rowMeans(., na.rm = TRUE))
}

continent_mean(df, "asia")


#   asia_bangkok asia_tokyo asia_kathmandu asia
#1            NA         41             51   46
#2            NA         42             52   47
#3            33         43             NA   38
#4            NA         44             54   49
#5            35         45             55   45
#6            36         46             56   46
#7            NA         47             57   52
#8            38         48             NA   43
#9            39         49             NA   44
#10           40         NA             60   50

使用基数R,我们可以通过

Using base R, we can do similar thing by

continent_mean <- function(df, continent)  {
     df1 <- df[startsWith(names(df), "asia")]
     df1[continent] <- rowMeans(df1, na.rm = TRUE)
     df1
}






如果我们想要所有大洲的 rowMeans 一起,我们可以使用 split.default


If we want rowMeans of all the continents together we can use split.default

sapply(split.default(df, sub("_.*", "", names(df))), rowMeans, na.rm = TRUE)

#      asia europe
# [1,]   46      1
# [2,]   47     17
# [3,]   38     13
# [4,]   49     14
# [5,]   45     20
# [6,]   46      6
# [7,]   52     17
# [8,]   43     23
# [9,]   44     19
#[10,]   50     20

这篇关于自定义函数,以使用starts_with()为行修改新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆