在dplyr中的mutate_each / summarise_each:如何选择某些列并为突变列提供新名称? [英] mutate_each / summarise_each in dplyr: how do I select certain columns and give new names to mutated columns?

查看:240
本文介绍了在dplyr中的mutate_each / summarise_each:如何选择某些列并为突变列提供新名称?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有点困惑,关于 dplyr 动词 mutate_each。



使用基本的 mutate 将数据列转换为z分数,并在数据中创建一个新列是非常简单的。框架(此处名称为 z_score_data ):

  newDF< DF%>%
select(one_column)%>%
mutate(z_score_data = one_column - (mean(one_column)/ sd(one_column))
/ pre>

然而,由于我有很多数据列需要转换,所以似乎应该使用 mutate_each 动词

  newDF<  -  DF%>%
mutate_each(funs(scale)

到目前为止这么好,但到目前为止我还没有想到: p>


  1. 如何给这些新列适当的名称,像我可以在 mutate ? / li>
  2. 如何选择我想要的某些列在第一种情况下,就像我在中选择而变化一样?

谢谢为您的帮助。

解决方案

更新为dplyr> = 0.4.3.9000



在dplyr开发版本0.4.3.9000(在写作时),在 mutate_each summarise_each 之间的命名已经被简化如新闻所述:


summarise_each() mutate_each()


如果您只想在 mutate_each / summarise_each ,你想给这些列新的名字。



要显示差异,以下是使用新命名功能的dplyr 0.4.3.9000的输出,与下面的选项 a.2 相反:

  library(dplyr)#> = 0.4.3.9000 
iris%>%mutate_each(funs(mysum = sum(。)),-Species)%>%head()
#Sepal.Length Sepal.Width Petal.Length Petal.Width物种Sepal.Length_mysum Sepal.Width_mysum
#1 5.1 3.5 1.4 0.2 setosa 876.5 458.6
#2 4.9 3.0 1.4 0.2 setosa 876.5 458.6
#3 4.7 3.2 1.3 0.2 setosa 876.5 458.6
#4 4.6 3.1 1.5 0.2 setosa 876.5 458.6
#5 5.0 3.6 1.4 0.2 setosa 876.5 458.6
#6 5.4 3.9 1.7 0.4 setosa 876.5 458.6
#Petal.Length_mysum Petal.Width_mysum
#1 563.7 179.9
#2 563.7 179.9
#3 563.7 179.9
#4 563.7 179.9
# 5 563.7 179.9
#6 563.7 179.9

如果您不提供新的名称,只提供1个功能,dplyr将更改现有的列(如以前的版本所示):

  iris%>%mutate_each (funs(sum),-Species)%>%head()
#Sepal.Length Sepal.Width Petal.Length Petal.Width物种
#1 876.5 458.6 563.7 179.9 setosa
# 2 876.5 458.6 563.7 179.9 setosa
#3 876.5 458.6 563.7 179.9 setosa
#4 876.5 458.6 563.7 179.9 setosa
#5 876.5 458.6 563.7 179.9 setosa
#6 876.5 458.6 563.7 179.9 setosa

我认为这个新功能将在下一个CRAN中提供发行版本0.4.4。






dplyr verions< = 0.4.3:




如何给这些新列适当的名称,像我可以在
mutate?




a)应用于 mutate_each中的1个函数mutate_each / summarise_each



如果您仅在 mutate_each summarise_each 内部仅应用一个函数,现有列将被转换并且名称将保持原样,除非您提供命名矢量到 mutate_each _ / summarise_each _ (见选项a.4)



以下是一些示例:



a.1只有1个功能 - >将保留现有的名称



  iris%>%mutate_each(funs(sum),-Species) %>%head()
#Sepal.Length Sepal.Width Petal.Length Petal.Width物种
#1 876 459 564 180 setosa
#2 876 459 564 180 setosa
#3 876 459 564 180 setosa
#4 876 459 564 180 setosa
#5 876 459 564 180 setosa
#6 876 459 564 180 setosa



a.2如果您指定了新的列名称扩展名:



  iris%>%mutate_each(funs(mysum = sum(。)),-Species)%>%head()
#Sepal.Length Sepal.Width Petal.Length Petal。宽度种类
#1 876 459 564 180 setosa
#2 876 459 564 180 setosa
#3 876 459 564 180 setosa
#4 876 459 564 180 setosa
#5 876 459 564 180 setosa
#6 876 459 564 180 setosa



a.3手动指定每列一个新名称(但仅适用于少数列):



  iris%>%mutate_each(funs(sum) SLsum = Sepal.Length,SWsum = Sepal.Width,-Species)%>%head()
#Sepal.Length Sepal.Width Petal.Length Petal.Width物种SLsum SWsum
#1 5.1 3.5 1.4 0.2 setosa 876 459
#2 4.9 3.0 1.4 0.2 setosa 876 459
#3 4.7 3.2 1.3 0.2 setosa 876 459
#4 4.6 3.1 1.5 0.2 setosa 876 459
#5 5.0 3.6 1.4 0.2 setosa 876 459
#6 5.4 3.9 1.7 0.4 setosa 876 459



a .4使用命名向量创建额外的柱ns新名称:



案例1:保留原始列



对于选项a.1,a.2和a.3,dplyr将保持现有列不变,并在此方法中创建新列。新列的名称等于您先前创建的命名向量的名称(在这种情况下为 vars )。

  vars<  -  names(iris)[1:2]#选择哪些列应该被突变
vars< ; - setNames(vars,paste0(vars,_sum))#创建新列名称
iris%>%mutate_each_(funs(sum),vars)%>%head
#Sepal。长度萼片宽度花瓣长度花瓣宽度种类Sepal.Length_sum Sepal.Width_sum
#1 5.1 3.5 1.4 0.2 setosa 876.5 458.6
#2 4.9 3.0 1.4 0.2 setosa 876.5 458.6
#3 4.7 3.2 1.3 0.2 setosa 876.5 458.6
#4 4.6 3.1 1.5 0.2 setosa 876.5 458.6
#5 5.0 3.6 1.4 0.2 setosa 876.5 458.6
#6 5.4 3.9 1.7 0.4 setosa 876.5 458.6

案例2:删除原始colu mns



如您所见,此方法可以保持现有的列不变,并添加具有指定名称的新列。如果您不想保留原始列,而只是新创建的列(和其他列),则可以稍后添加选择语句:

  iris%>%mutate_each_(funs(sum),vars)%>%select(-one_of(vars))%> %head 
#Petal.Length Petal.Width物种Sepal.Length_sum Sepal.Width_sum
#1 1.4 0.2 setosa 876.5 458.6
#2 1.4 0.2 setosa 876.5 458.6
#3 1.3 0.2 setosa 876.5 458.6
#4 1.5 0.2 setosa 876.5 458.6
#5 1.4 0.2 setosa 876.5 458.6
#6 1.7 0.4 setosa 876.5 458.6

b)应用于 mutate_each / summarise_each



b.1让dplyr找出新的名字



如果你应用了多个功能,你可以让dplyr自己找出名字(它会保留现有的列):

  iris%>%mutate_each(funs(sum,mean),-Species)%>%head()
#Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_sum Sepal.Width_sum Petal .Length_sum
#1 5.1 3.5 1.4 0.2 setosa 876 459 564
#2 4.9 3.0 1.4 0.2 setosa 876 459 564
#3 4.7 3.2 1.3 0.2 setosa 876 459 564
#4 4.6 3.1 1.5 0.2 setosa 876 459 564
#5 5.0 3.6 1.4 0.2 setosa 876 459 564
#6 5.4 3.9 1.7 0.4 setosa 876 459 564
#Petal.Width_sum Sepal.Length_mean Sepal.Width_mean Petal.Length_mean Petal.Width_mean
#1 180 5.84 3.06 3.76 1.2
#2 180 5.84 3.06 3.76 1.2
#3 180 5.84 3.06 3.76 1.2
#4 180 5.84 3.06 3.76 1.2
#5 180 5.84 3.06 3.76 1.2
#6 180 5.84 3.06 3.76 1.2
b
$ b

b.2手动指定新列名称



另一个选项是使用多个功能时,要自己指定列扩展名:

  iris%>%mutate_each(funs(MySum = sum ,MyMean = mean(。)),-Species)%>%head()
# Sepal.Length Sepal.Width Petal.Length花瓣种类Sepal.Length_MySum Sepal.Width_MySum Petal.Length_MySum
#1 5.1 3.5 1.4 0.2 setosa 876 459 564
#2 4.9 3.0 1.4 0.2 setosa 876 459 564
#3 4.7 3.2 1.3 0.2 setosa 876 459 564
#4 4.6 3.1 1.5 0.2 setosa 876 459 564
#5 5.0 3.6 1.4 0.2 setosa 876 459 564
#6 5.4 3.9 1.7 0.4 setosa 876 459 564
#Petal.Width_MySum Sepal.Length_MyMean Sepal.Width_MyMean Petal.Length_MyMean Petal.Width_MyMean
#1 180 5.84 3.06 3.76 1.2
#2 180 5.84 3.06 3.76 1.2
#3 180 5.84 3.06 3.76 1.2
#4 180 5.84 3.06 3.76 1.2
#5 180 5.84 3.06 3.76 1.2
#6 180 5.84 3.06 3.76 1.2




如何选择某些列我想改变,就像我在第一种情况下选择的


你可以通过引用通过给出这样的名称(mutate Sepal.Length,但不是Species)来突变(或省略)列:

  iris%>%mutate_each(funs(sum),Sepal.Length,-Species) %>%head()

此外,您可以使用特殊功能选择要突变的列,所有列以开头或包含某个字等为例,例如:

  iris%>%mutate_each(funs sum),contains(Sepal),-Species)%>%head()

有关这些功能的更多信息,请参阅?mutate_each ?选择



注释后编辑1:



如果要使用标准评估,dplyr提供大多数函数的SE版本,以_结尾。所以在这种情况下,你可以使用:

  x<  -  c(Sepal.Width,Sepal.Length #个列名称矢量
iris%>%mutate_each_(funs(sum),x)%>%head()

注意我在这里使用的 mutate_each _






编辑2:用选项a.4更新


I'm a bit confused about the dplyr verb mutate_each.

It's pretty straightforward to use the basic mutate to transform a column of data into, say, z-scores, and create a new column in your data.frame (here with the name z_score_data):

newDF <- DF %>%
  select(one_column) %>%
  mutate(z_score_data = one_column - (mean(one_column) / sd(one_column))

However, since I have many columns of data I'd like to transform, it appears I should probably use the mutate_each verb.

newDF <- DF %>%
     mutate_each(funs(scale))

So far so good. But as of yet I haven't been able to figure out:

  1. How can I give these new columns appropriate names, like I can in mutate?
  2. How can I select certain columns that I wish to mutate, like I did with select in the first case?

Thanks for your help.

解决方案

Update for dplyr >= 0.4.3.9000

In the dplyr development version 0.4.3.9000 (at time of writing), naming inside mutate_each and summarise_each has been simplified as noted in the News:

The naming behaviour of summarise_each() and mutate_each() has been tweaked so that you can force inclusion of both the function and the variable name: summarise_each(mtcars, funs(mean = mean), everything())

This is mainly important if you want to apply only 1 function inside mutate_each / summarise_each and you want to give those column new names.

To show the difference, here's the output from dplyr 0.4.3.9000 using the new naming functionality, in contrast to option a.2 below:

library(dplyr) # >= 0.4.3.9000
iris %>% mutate_each(funs(mysum = sum(.)), -Species) %>% head()
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_mysum Sepal.Width_mysum
#1          5.1         3.5          1.4         0.2  setosa              876.5             458.6
#2          4.9         3.0          1.4         0.2  setosa              876.5             458.6
#3          4.7         3.2          1.3         0.2  setosa              876.5             458.6
#4          4.6         3.1          1.5         0.2  setosa              876.5             458.6
#5          5.0         3.6          1.4         0.2  setosa              876.5             458.6
#6          5.4         3.9          1.7         0.4  setosa              876.5             458.6
#  Petal.Length_mysum Petal.Width_mysum
#1              563.7             179.9
#2              563.7             179.9
#3              563.7             179.9
#4              563.7             179.9
#5              563.7             179.9
#6              563.7             179.9

If you don't supply new names and you only supply 1 function, dplyr will change the existing columns (as it did in previous versions):

iris %>% mutate_each(funs(sum), -Species) %>% head()
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1        876.5       458.6        563.7       179.9  setosa
#2        876.5       458.6        563.7       179.9  setosa
#3        876.5       458.6        563.7       179.9  setosa
#4        876.5       458.6        563.7       179.9  setosa
#5        876.5       458.6        563.7       179.9  setosa
#6        876.5       458.6        563.7       179.9  setosa

I assume that this new functionality will be available via CRAN in the next release version 0.4.4.


dplyr verions <= 0.4.3:

How can I give these new columns appropriate names, like I can in mutate?

a) 1 function applied in mutate_each/summarise_each

If you apply only 1 function inside the mutate_each or summarise_each, the existing columns will be transformed and the names will be kept as they used to be, unless you supply a named vector to mutate_each_/summarise_each_ (see option a.4)

Here are some examples:

a.1 only 1 function -> will keep the existing names

iris %>% mutate_each(funs(sum), -Species) %>% head()
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1          876         459          564         180  setosa
#2          876         459          564         180  setosa
#3          876         459          564         180  setosa
#4          876         459          564         180  setosa
#5          876         459          564         180  setosa
#6          876         459          564         180  setosa

a.2 also if you specify a new column name extension:

iris %>% mutate_each(funs(mysum = sum(.)), -Species) %>% head()
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1          876         459          564         180  setosa
#2          876         459          564         180  setosa
#3          876         459          564         180  setosa
#4          876         459          564         180  setosa
#5          876         459          564         180  setosa
#6          876         459          564         180  setosa

a.3 Manually specify a new name per column (but only practical for few columns):

iris %>% mutate_each(funs(sum), SLsum = Sepal.Length,SWsum = Sepal.Width,  -Species) %>% head()
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species SLsum SWsum
#1          5.1         3.5          1.4         0.2  setosa   876   459
#2          4.9         3.0          1.4         0.2  setosa   876   459
#3          4.7         3.2          1.3         0.2  setosa   876   459
#4          4.6         3.1          1.5         0.2  setosa   876   459
#5          5.0         3.6          1.4         0.2  setosa   876   459
#6          5.4         3.9          1.7         0.4  setosa   876   459

a.4 Use a named vector to create additional columns with new names:

case 1: keep original columns

In contrast to options a.1, a.2 and a.3, dplyr will keep the existing columns unchanged and create new columns in this approach. The names of the new columns equal the names of the named vector you create in advance (vars in this case).

vars <- names(iris)[1:2]  # choose which columns should be mutated
vars <- setNames(vars, paste0(vars, "_sum")) # create new column names
iris %>% mutate_each_(funs(sum), vars) %>% head 
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_sum Sepal.Width_sum
#1          5.1         3.5          1.4         0.2  setosa            876.5           458.6
#2          4.9         3.0          1.4         0.2  setosa            876.5           458.6
#3          4.7         3.2          1.3         0.2  setosa            876.5           458.6
#4          4.6         3.1          1.5         0.2  setosa            876.5           458.6
#5          5.0         3.6          1.4         0.2  setosa            876.5           458.6
#6          5.4         3.9          1.7         0.4  setosa            876.5           458.6

case 2: remove original columns

As you can see, this approach keeps the existing columns unchanged and adds new columns with specified names. In case you don't want to keep the original columns, but just the newly created columns (and the other columns) you can just add a select statement afterwards:

iris %>% mutate_each_(funs(sum), vars) %>% select(-one_of(vars)) %>% head
#  Petal.Length Petal.Width Species Sepal.Length_sum Sepal.Width_sum
#1          1.4         0.2  setosa            876.5           458.6
#2          1.4         0.2  setosa            876.5           458.6
#3          1.3         0.2  setosa            876.5           458.6
#4          1.5         0.2  setosa            876.5           458.6
#5          1.4         0.2  setosa            876.5           458.6
#6          1.7         0.4  setosa            876.5           458.6

b) more than 1 function applied in mutate_each/summarise_each

b.1 Let dplyr figure out new names

If you applied more than 1 function, you can let dplyr figure out names by itself (and it will keep the existing columns):

iris %>% mutate_each(funs(sum, mean), -Species) %>% head()
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_sum Sepal.Width_sum Petal.Length_sum
#1          5.1         3.5          1.4         0.2  setosa              876             459              564
#2          4.9         3.0          1.4         0.2  setosa              876             459              564
#3          4.7         3.2          1.3         0.2  setosa              876             459              564
#4          4.6         3.1          1.5         0.2  setosa              876             459              564
#5          5.0         3.6          1.4         0.2  setosa              876             459              564
#6          5.4         3.9          1.7         0.4  setosa              876             459              564
#  Petal.Width_sum Sepal.Length_mean Sepal.Width_mean Petal.Length_mean Petal.Width_mean
#1             180              5.84             3.06              3.76              1.2
#2             180              5.84             3.06              3.76              1.2
#3             180              5.84             3.06              3.76              1.2
#4             180              5.84             3.06              3.76              1.2
#5             180              5.84             3.06              3.76              1.2
#6             180              5.84             3.06              3.76              1.2

b.2 Manually specify new column names

Another option, when using more than 1 function, is to specify the column name extension on your own:

iris %>% mutate_each(funs(MySum = sum(.), MyMean = mean(.)), -Species) %>% head()
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_MySum Sepal.Width_MySum Petal.Length_MySum
#1          5.1         3.5          1.4         0.2  setosa                876               459                564
#2          4.9         3.0          1.4         0.2  setosa                876               459                564
#3          4.7         3.2          1.3         0.2  setosa                876               459                564
#4          4.6         3.1          1.5         0.2  setosa                876               459                564
#5          5.0         3.6          1.4         0.2  setosa                876               459                564
#6          5.4         3.9          1.7         0.4  setosa                876               459                564
#  Petal.Width_MySum Sepal.Length_MyMean Sepal.Width_MyMean Petal.Length_MyMean Petal.Width_MyMean
#1               180                5.84               3.06                3.76                1.2
#2               180                5.84               3.06                3.76                1.2
#3               180                5.84               3.06                3.76                1.2
#4               180                5.84               3.06                3.76                1.2
#5               180                5.84               3.06                3.76                1.2
#6               180                5.84               3.06                3.76                1.2

How can I select certain columns that I wish to mutate, like I did with select in the first case?

You can do that by referencing the columns to be mutated (or left out) by giving their names like here (mutate Sepal.Length, but not Species):

iris %>% mutate_each(funs(sum), Sepal.Length, -Species) %>% head()

In addition, you can use special functions to select columns to be mutated, all columns that start with or contain a certain word etc by using for example:

iris %>% mutate_each(funs(sum), contains("Sepal"),  -Species) %>% head()

For more information of those functions, see ?mutate_each and ?select.

Edit 1 after comment:

If you want to use standard evaluation, dplyr supplies SE-versions of most functions ending with an addtional "_". So in this case you would use:

x <- c("Sepal.Width", "Sepal.Length") # vector of column names 
iris %>% mutate_each_(funs(sum), x) %>% head()

Notice the mutate_each_ I used here.


Edit 2: updated with option a.4

这篇关于在dplyr中的mutate_each / summarise_each:如何选择某些列并为突变列提供新名称?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆