突变数据框中的多列 [英] Mutate multiple columns in a dataframe
问题描述
我有一个看起来像这样的数据集。
I have a data set that looks like this.
bankname bankid year totass cash bond loans
Bank A 1 1881 244789 7250 20218 29513
Bank B 2 1881 195755 10243 185151 2800
Bank C 3 1881 107736 13357 177612 NA
Bank D 4 1881 170600 35000 20000 5000
Bank E 5 1881 3200000 351266 314012 NA
,我想根据银行资产负债表计算一些比率。我希望数据集看起来像这样
and I want to compute some ratios based on bank balance sheets. and I want the dataset to look like this
bankname bankid year totass cash bond loans CashtoAsset BondtoAsset LoanstoAsset
Bank A 1 1881 2447890 7250 202100 951300 0.002 0.082 0.388
Bank B 2 1881 195755 10243 185151 2800 0.052 0.945 0.014
Bank C 3 1881 107736 13357 177612 NA 0.123 1.648585431 NA
Bank D 4 1881 170600 35000 20000 5000 0.205 0.117 0.029
Bank E 5 1881 32000000 351266 314012 NA 0.0109 0.009 NA
复制数据的代码
bankname <- c("Bank A","Bank B","Bank C","Bank D","Bank E")
bankid <- c( 1, 2, 3, 4, 5)
year<- c( 1881, 1881, 1881, 1881, 1881)
totass <- c(244789, 195755, 107736, 170600, 32000000)
cash<-c(7250,10243,13357,35000,351266)
bond<-c(20218,185151,177612,20000,314012)
loans<-c(29513,2800,NA,5000,NA)
bankdata<-data.frame(bankname, bankid,year,totass, cash, bond, loans)
首先,我摆脱了资产负债表中的资产净值。
First, I got rid of NAs in balance sheets.
cols <- c("totass", "cash", "bond", "loans")
bankdata[cols][is.na(bankdata[cols])] <- 0
然后我计算比率
library(dplyr)
bankdata<-mutate(bankdata,CashtoAsset = cash/totass)
bankdata<-mutate(bankdata,BondtoAsset = bond/totass)
bankdata<-mutate(bankdata,loanstoAsset =loans/totass)
但是而不是逐行计算所有这些比率,我想创建一个外观来一次完成所有这些工作。在Stata中,我会
But, instead of computing all these ratios line by line, I want to create a look to do this all at once. In Stata, I would do
foreach x of varlist cash bond loans {
by bankid: gen `x'toAsset = `x'/ totass
}
我该怎么做?
推荐答案
更新(截至2019年3月18日)
改变。我们一直在 .funs
( funs(name = f(。)中使用
)。但这已更改(上面的dplyr 0.8.0),现在我们使用 funs()
) list <代替
funs
/ code>( list(name =〜f(。))
)。请参见以下新示例。
Update (as of the 18th of March, 2019)
There has been a change. We have been using funs()
in .funs
(funs(name = f(.)
). But this is changed (dplyr 0.8.0 above). Instead of funs
, now we use list
(list(name = ~f(.))
). See the following new examples.
bankdata %>%
mutate_at(.funs = list(toAsset = ~./totass), .vars = vars(cash:loans))
bankdata %>%
mutate_at(.funs = list(toAsset = ~./totass), .vars = c("cash", "bond", "loans"))
bankdata %>%
mutate_at(.funs = list(toAsset = ~./totass), .vars = 5:7)
更新(截至2017年12月2日)
自从我回答了这个问题之后,我意识到一些SO用户一直在检查此答案。此后dplyr包已更改。因此,我留下以下更新。希望这将有助于一些R用户学习如何使用 mutate_at()
。
mutate_each()
现在不建议使用泰德您想使用 mutate_at()
。您可以在 .vars
中指定要应用功能的列。一种方法是使用 vars()
。另一个方法是使用包含列名的字符向量,您想在 .fun
中应用自定义函数。另一种方法是指定带有数字的列(例如,在这种情况下为5:7)。请注意,如果将列用于 group_by()
,则需要更改列位置数。看看此问题。
mutate_each()
is now deprecated. You want to use mutate_at()
, instead. You can specify which columns you want to apply your function in .vars
. One way is to use vars()
. Another is to use a character vector containing column names, which you want to apply your custom function in .fun
. The other is to specify columns with numbers (e.g., 5:7 in this case). Note that, if you use a column for group_by()
, you need to change the numbers of column positions. Have a look of this question.
bankdata %>%
mutate_at(.funs = funs(toAsset = ./totass), .vars = vars(cash:loans))
bankdata %>%
mutate_at(.funs = funs(toAsset = ./totass), .vars = c("cash", "bond", "loans"))
bankdata %>%
mutate_at(.funs = funs(toAsset = ./totass), .vars = 5:7)
# bankname bankid year totass cash bond loans cash_toAsset bond_toAsset loans_toAsset
#1 Bank A 1 1881 244789 7250 20218 29513 0.02961734 0.082593581 0.12056506
#2 Bank B 2 1881 195755 10243 185151 2800 0.05232561 0.945830247 0.01430359
#3 Bank C 3 1881 107736 13357 177612 NA 0.12397899 1.648585431 NA
#4 Bank D 4 1881 170600 35000 20000 5000 0.20515826 0.117233294 0.02930832
#5 Bank E 5 1881 32000000 351266 314012 NA 0.01097706 0.009812875 NA
我故意将分配给资产
到 .fun
中的自定义函数,因为这将帮助我安排新的列名。以前,我使用 rename()
。但是我认为在当前方法中使用 gsub()
清除列名要容易得多。如果以上结果另存为 out
,则要运行以下代码以删除列中的 _
I purposely gave toAsset
to the custom function in .fun
since this will help me to arrange new column names. Previously, I used rename()
. But I think it is much easier to clean up column names with gsub()
in the present approach. If the above result is saved as out
, you want to run the following code in order to remove _
in the column names.
names(out) <- gsub(names(out), pattern = "_", replacement = "")
原始答案
我想你可以使用dplyr以这种方式保存一些键入内容。缺点是您覆盖现金,债券和贷款。
Original answer
I think you can save some typing in this way with dplyr. The downside is you overwrite cash, bond, and loans.
bankdata %>%
group_by(bankname) %>%
mutate_each(funs(whatever = ./totass), cash:loans)
# bankname bankid year totass cash bond loans
#1 Bank A 1 1881 244789 0.02961734 0.082593581 0.12056506
#2 Bank B 2 1881 195755 0.05232561 0.945830247 0.01430359
#3 Bank C 3 1881 107736 0.12397899 1.648585431 NA
#4 Bank D 4 1881 170600 0.20515826 0.117233294 0.02930832
#5 Bank E 5 1881 32000000 0.01097706 0.009812875 NA
如果您希望获得预期的结果,我认为需要输入一些信息。重命名部分似乎是您必须要做的事情。
If you prefer your expected outcome, I think some typing is necessary. The renaming part seems to be something you gotta do.
bankdata %>%
group_by(bankname) %>%
summarise_each(funs(whatever = ./totass), cash:loans) %>%
rename(cashtoAsset = cash, bondtoAsset = bond, loanstoAsset = loans) -> ana;
ana %>%
merge(bankdata,., by = "bankname")
# bankname bankid year totass cash bond loans cashtoAsset bondtoAsset loanstoAsset
#1 Bank A 1 1881 244789 7250 20218 29513 0.02961734 0.082593581 0.12056506
#2 Bank B 2 1881 195755 10243 185151 2800 0.05232561 0.945830247 0.01430359
#3 Bank C 3 1881 107736 13357 177612 NA 0.12397899 1.648585431 NA
#4 Bank D 4 1881 170600 35000 20000 5000 0.20515826 0.117233294 0.02930832
#5 Bank E 5 1881 32000000 351266 314012 NA 0.01097706 0.009812875 NA
这篇关于突变数据框中的多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!