改变数据框中的多列 [英] Mutate multiple columns in a dataframe
问题描述
我有一个看起来像这样的数据集.
I have a data set that looks like this.
bankname bankid year totass cash bond loans
Bank A 1 1881 244789 7250 20218 29513
Bank B 2 1881 195755 10243 185151 2800
Bank C 3 1881 107736 13357 177612 NA
Bank D 4 1881 170600 35000 20000 5000
Bank E 5 1881 3200000 351266 314012 NA
我想根据银行资产负债表计算一些比率.我希望数据集看起来像这样
and I want to compute some ratios based on bank balance sheets. and I want the dataset to look like this
bankname bankid year totass cash bond loans CashtoAsset BondtoAsset LoanstoAsset
Bank A 1 1881 2447890 7250 202100 951300 0.002 0.082 0.388
Bank B 2 1881 195755 10243 185151 2800 0.052 0.945 0.014
Bank C 3 1881 107736 13357 177612 NA 0.123 1.648585431 NA
Bank D 4 1881 170600 35000 20000 5000 0.205 0.117 0.029
Bank E 5 1881 32000000 351266 314012 NA 0.0109 0.009 NA
这里是复制数据的代码
bankname <- c("Bank A","Bank B","Bank C","Bank D","Bank E")
bankid <- c( 1, 2, 3, 4, 5)
year<- c( 1881, 1881, 1881, 1881, 1881)
totass <- c(244789, 195755, 107736, 170600, 32000000)
cash<-c(7250,10243,13357,35000,351266)
bond<-c(20218,185151,177612,20000,314012)
loans<-c(29513,2800,NA,5000,NA)
bankdata<-data.frame(bankname, bankid,year,totass, cash, bond, loans)
首先,我去掉了资产负债表中的 NAs.
First, I got rid of NAs in balance sheets.
cols <- c("totass", "cash", "bond", "loans")
bankdata[cols][is.na(bankdata[cols])] <- 0
然后我计算比率
library(dplyr)
bankdata<-mutate(bankdata,CashtoAsset = cash/totass)
bankdata<-mutate(bankdata,BondtoAsset = bond/totass)
bankdata<-mutate(bankdata,loanstoAsset =loans/totass)
但是,不是逐行计算所有这些比率,我想创建一个外观来一次性完成所有这些.在 Stata 中,我会这样做
But, instead of computing all these ratios line by line, I want to create a look to do this all at once. In Stata, I would do
foreach x of varlist cash bond loans {
by bankid: gen `x'toAsset = `x'/ totass
}
我该怎么做?
推荐答案
更新(截至 2019 年 3 月 18 日)
发生了变化.我们一直在 .funs
中使用 funs()
(funs(name = f(.)
).但这已经改变了 (dplyr 0.8.0 以上.而不是 funs
,现在我们使用 list
(list(name = ~f(.))
). 看下面的新示例.
Update (as of the 18th of March, 2019)
There has been a change. We have been using funs()
in .funs
(funs(name = f(.)
). But this is changed (dplyr 0.8.0 above). Instead of funs
, now we use list
(list(name = ~f(.))
). See the following new examples.
bankdata %>%
mutate_at(.funs = list(toAsset = ~./totass), .vars = vars(cash:loans))
bankdata %>%
mutate_at(.funs = list(toAsset = ~./totass), .vars = c("cash", "bond", "loans"))
bankdata %>%
mutate_at(.funs = list(toAsset = ~./totass), .vars = 5:7)
更新(截至 2017 年 12 月 2 日)
自从我回答了这个问题,我意识到一些 SO 用户一直在检查这个答案.dplyr 包从那时起就发生了变化.因此,我留下以下更新.我希望这能帮助一些 R 用户学习如何使用 mutate_at()
.
mutate_each()
现在已弃用.您想改用 mutate_at()
.您可以在 .vars
中指定要应用函数的列.一种方法是使用 vars()
.另一种方法是使用包含列名称的字符向量,您希望在 .fun
中应用自定义函数.另一种是用数字指定列(例如,在这种情况下为 5:7).请注意,如果您为 group_by()
使用列,则需要更改列位置的数量.看看这个问题.
mutate_each()
is now deprecated. You want to use mutate_at()
, instead. You can specify which columns you want to apply your function in .vars
. One way is to use vars()
. Another is to use a character vector containing column names, which you want to apply your custom function in .fun
. The other is to specify columns with numbers (e.g., 5:7 in this case). Note that, if you use a column for group_by()
, you need to change the numbers of column positions. Have a look of this question.
bankdata %>%
mutate_at(.funs = funs(toAsset = ./totass), .vars = vars(cash:loans))
bankdata %>%
mutate_at(.funs = funs(toAsset = ./totass), .vars = c("cash", "bond", "loans"))
bankdata %>%
mutate_at(.funs = funs(toAsset = ./totass), .vars = 5:7)
# bankname bankid year totass cash bond loans cash_toAsset bond_toAsset loans_toAsset
#1 Bank A 1 1881 244789 7250 20218 29513 0.02961734 0.082593581 0.12056506
#2 Bank B 2 1881 195755 10243 185151 2800 0.05232561 0.945830247 0.01430359
#3 Bank C 3 1881 107736 13357 177612 NA 0.12397899 1.648585431 NA
#4 Bank D 4 1881 170600 35000 20000 5000 0.20515826 0.117233294 0.02930832
#5 Bank E 5 1881 32000000 351266 314012 NA 0.01097706 0.009812875 NA
我特意将 toAsset
提供给 .fun
中的自定义函数,因为这将帮助我安排新的列名.以前,我使用了 rename()
.但我认为在目前的方法中使用 gsub()
清理列名要容易得多.如果上面的结果保存为out
,你要运行下面的代码来去掉列名中的_
.
I purposely gave toAsset
to the custom function in .fun
since this will help me to arrange new column names. Previously, I used rename()
. But I think it is much easier to clean up column names with gsub()
in the present approach. If the above result is saved as out
, you want to run the following code in order to remove _
in the column names.
names(out) <- gsub(names(out), pattern = "_", replacement = "")
原答案
我认为您可以使用 dplyr 以这种方式节省一些输入.缺点是您会覆盖现金、债券和贷款.
Original answer
I think you can save some typing in this way with dplyr. The downside is you overwrite cash, bond, and loans.
bankdata %>%
group_by(bankname) %>%
mutate_each(funs(whatever = ./totass), cash:loans)
# bankname bankid year totass cash bond loans
#1 Bank A 1 1881 244789 0.02961734 0.082593581 0.12056506
#2 Bank B 2 1881 195755 0.05232561 0.945830247 0.01430359
#3 Bank C 3 1881 107736 0.12397899 1.648585431 NA
#4 Bank D 4 1881 170600 0.20515826 0.117233294 0.02930832
#5 Bank E 5 1881 32000000 0.01097706 0.009812875 NA
如果您更喜欢预期的结果,我认为有必要进行一些打字.重命名部分似乎是您必须做的事情.
If you prefer your expected outcome, I think some typing is necessary. The renaming part seems to be something you gotta do.
bankdata %>%
group_by(bankname) %>%
summarise_each(funs(whatever = ./totass), cash:loans) %>%
rename(cashtoAsset = cash, bondtoAsset = bond, loanstoAsset = loans) -> ana;
ana %>%
merge(bankdata,., by = "bankname")
# bankname bankid year totass cash bond loans cashtoAsset bondtoAsset loanstoAsset
#1 Bank A 1 1881 244789 7250 20218 29513 0.02961734 0.082593581 0.12056506
#2 Bank B 2 1881 195755 10243 185151 2800 0.05232561 0.945830247 0.01430359
#3 Bank C 3 1881 107736 13357 177612 NA 0.12397899 1.648585431 NA
#4 Bank D 4 1881 170600 35000 20000 5000 0.20515826 0.117233294 0.02930832
#5 Bank E 5 1881 32000000 351266 314012 NA 0.01097706 0.009812875 NA
这篇关于改变数据框中的多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!