改变数据框中的多列 [英] Mutate multiple columns in a dataframe

查看:20
本文介绍了改变数据框中的多列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的数据集.

I have a data set that looks like this.

bankname    bankid  year    totass  cash    bond    loans
Bank A      1       1881    244789  7250    20218   29513
Bank B      2       1881    195755  10243   185151  2800
Bank C      3       1881    107736  13357   177612  NA
Bank D      4       1881    170600  35000   20000   5000
Bank E      5       1881    3200000 351266  314012  NA

我想根据银行资产负债表计算一些比率.我希望数据集看起来像这样

and I want to compute some ratios based on bank balance sheets. and I want the dataset to look like this

bankname    bankid  year    totass  cash    bond    loans   CashtoAsset BondtoAsset LoanstoAsset
Bank A      1       1881    2447890 7250    202100  951300  0.002   0.082   0.388
Bank B      2       1881    195755  10243   185151  2800    0.052   0.945   0.014
Bank C      3       1881    107736  13357   177612  NA  0.123   1.648585431 NA
Bank D      4       1881    170600  35000   20000   5000    0.205   0.117   0.029
Bank E      5       1881    32000000    351266  314012  NA  0.0109  0.009   NA

这里是复制数据的代码

bankname <- c("Bank A","Bank B","Bank C","Bank D","Bank E")
bankid <- c( 1, 2,  3,  4,  5)
year<- c( 1881, 1881,   1881,   1881,   1881)
totass  <- c(244789,    195755, 107736, 170600, 32000000)
cash<-c(7250,10243,13357,35000,351266)
bond<-c(20218,185151,177612,20000,314012)
loans<-c(29513,2800,NA,5000,NA)
bankdata<-data.frame(bankname, bankid,year,totass, cash, bond, loans)

首先,我去掉了资产负债表中的 NAs.

First, I got rid of NAs in balance sheets.

cols <- c("totass", "cash", "bond", "loans")
bankdata[cols][is.na(bankdata[cols])] <- 0

然后我计算比率

library(dplyr)
bankdata<-mutate(bankdata,CashtoAsset = cash/totass)
bankdata<-mutate(bankdata,BondtoAsset = bond/totass)
bankdata<-mutate(bankdata,loanstoAsset =loans/totass)

但是,不是逐行计算所有这些比率,我想创建一个外观来一次性完成所有这些.在 Stata 中,我会这样做

But, instead of computing all these ratios line by line, I want to create a look to do this all at once. In Stata, I would do

foreach x of varlist cash bond loans {
by bankid: gen `x'toAsset = `x'/ totass
}

我该怎么做?

推荐答案

更新(截至 2019 年 3 月 18 日)

发生了变化.我们一直在 .funs 中使用 funs() (funs(name = f(.)).但这已经改变了 (dplyr 0.8.0 以上.而不是 funs,现在我们使用 list (list(name = ~f(.))). 看下面的新示例.

Update (as of the 18th of March, 2019)

There has been a change. We have been using funs() in .funs (funs(name = f(.)). But this is changed (dplyr 0.8.0 above). Instead of funs, now we use list (list(name = ~f(.))). See the following new examples.

bankdata %>%
mutate_at(.funs = list(toAsset = ~./totass), .vars = vars(cash:loans))

bankdata %>%
mutate_at(.funs = list(toAsset = ~./totass), .vars = c("cash", "bond", "loans"))

bankdata %>%
mutate_at(.funs = list(toAsset = ~./totass), .vars = 5:7)

更新(截至 2017 年 12 月 2 日)

自从我回答了这个问题,我意识到一些 SO 用户一直在检查这个答案.dplyr 包从那时起就发生了变化.因此,我留下以下更新.我希望这能帮助一些 R 用户学习如何使用 mutate_at().

mutate_each() 现在已弃用.您想改用 mutate_at().您可以在 .vars 中指定要应用函数的列.一种方法是使用 vars().另一种方法是使用包含列名称的字符向量,您希望在 .fun 中应用自定义函数.另一种是用数字指定列(例如,在这种情况下为 5:7).请注意,如果您为 group_by() 使用列,则需要更改列位置的数量.看看这个问题.

mutate_each() is now deprecated. You want to use mutate_at(), instead. You can specify which columns you want to apply your function in .vars. One way is to use vars(). Another is to use a character vector containing column names, which you want to apply your custom function in .fun. The other is to specify columns with numbers (e.g., 5:7 in this case). Note that, if you use a column for group_by(), you need to change the numbers of column positions. Have a look of this question.

bankdata %>%
mutate_at(.funs = funs(toAsset = ./totass), .vars = vars(cash:loans))

bankdata %>%
mutate_at(.funs = funs(toAsset = ./totass), .vars = c("cash", "bond", "loans"))

bankdata %>%
mutate_at(.funs = funs(toAsset = ./totass), .vars = 5:7)

#  bankname bankid year   totass   cash   bond loans cash_toAsset bond_toAsset loans_toAsset
#1   Bank A      1 1881   244789   7250  20218 29513   0.02961734  0.082593581    0.12056506
#2   Bank B      2 1881   195755  10243 185151  2800   0.05232561  0.945830247    0.01430359
#3   Bank C      3 1881   107736  13357 177612    NA   0.12397899  1.648585431            NA
#4   Bank D      4 1881   170600  35000  20000  5000   0.20515826  0.117233294    0.02930832
#5   Bank E      5 1881 32000000 351266 314012    NA   0.01097706  0.009812875            NA

我特意将 toAsset 提供给 .fun 中的自定义函数,因为这将帮助我安排新的列名.以前,我使用了 rename().但我认为在目前的方法中使用 gsub() 清理列名要容易得多.如果上面的结果保存为out,你要运行下面的代码来去掉列名中的_.

I purposely gave toAsset to the custom function in .fun since this will help me to arrange new column names. Previously, I used rename(). But I think it is much easier to clean up column names with gsub() in the present approach. If the above result is saved as out, you want to run the following code in order to remove _ in the column names.

names(out) <- gsub(names(out), pattern = "_", replacement = "")

原答案

我认为您可以使用 dplyr 以这种方式节省一些输入.缺点是您会覆盖现金、债券和贷款.

Original answer

I think you can save some typing in this way with dplyr. The downside is you overwrite cash, bond, and loans.

bankdata %>%
    group_by(bankname) %>%
    mutate_each(funs(whatever = ./totass), cash:loans)

#  bankname bankid year   totass       cash        bond      loans
#1   Bank A      1 1881   244789 0.02961734 0.082593581 0.12056506
#2   Bank B      2 1881   195755 0.05232561 0.945830247 0.01430359
#3   Bank C      3 1881   107736 0.12397899 1.648585431         NA
#4   Bank D      4 1881   170600 0.20515826 0.117233294 0.02930832
#5   Bank E      5 1881 32000000 0.01097706 0.009812875         NA

如果您更喜欢预期的结果,我认为有必要进行一些打字.重命名部分似乎是您必须做的事情.

If you prefer your expected outcome, I think some typing is necessary. The renaming part seems to be something you gotta do.

bankdata %>%
    group_by(bankname) %>%
    summarise_each(funs(whatever = ./totass), cash:loans) %>%
    rename(cashtoAsset = cash, bondtoAsset = bond, loanstoAsset = loans) -> ana;
    ana %>%
    merge(bankdata,., by = "bankname")

#  bankname bankid year   totass   cash   bond loans cashtoAsset bondtoAsset loanstoAsset
#1   Bank A      1 1881   244789   7250  20218 29513  0.02961734 0.082593581   0.12056506
#2   Bank B      2 1881   195755  10243 185151  2800  0.05232561 0.945830247   0.01430359
#3   Bank C      3 1881   107736  13357 177612    NA  0.12397899 1.648585431           NA
#4   Bank D      4 1881   170600  35000  20000  5000  0.20515826 0.117233294   0.02930832
#5   Bank E      5 1881 32000000 351266 314012    NA  0.01097706 0.009812875           NA

这篇关于改变数据框中的多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆