根据选定的列创建新列,并按组计算比率 [英] Creating new columns based on selected columns that calculates the ratio by group

查看:84
本文介绍了根据选定的列创建新列,并按组计算比率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据如下所示:

DF <- structure(list(No_Adjusted_Gross_Income = structure(c(1L, 1L, 
    2L, 2L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"), 
    NoR_from_1_to_5000 = c(1035373, 4272260, 1124098, 1035373, 
    4272260, 1124098), NoR_from_5000_to_10000 = c(319540, 4826042, 
    1959866, 319540, 4826042, 1959866), AGI_from_1_to_5000 = c(2588950186.5, 
    10682786130, 2810807049, 2588950186.5, 10682786130, 2810807049
    ), AGI_from_5000_to_10000 = c(2396550000, 36195315000, 14698995000, 
    2396550000, 36195315000, 14698995000)), class = "data.frame", row.names = c(NA, 
    -6L))

vn <- c("sum1", "sum2", "sum3", "sum4")

  No_Adjusted_Gross_Income NoR_from_1_to_5000 NoR_from_5000_to_10000 AGI_from_1_to_5000 AGI_from_5000_to_10000
1                        A            1035373                 319540         2588950187             2396550000
2                        A            4272260                4826042        10682786130            36195315000
3                        B            1124098                1959866         2810807049            14698995000
4                        B            1035373                 319540         2588950187             2396550000
5                        C            4272260                4826042        10682786130            36195315000
6                        C            1124098                1959866         2810807049            14698995000

对于第2列到第5列,我想创建一个新列,其值为,即原始值除以值的总和除以 No_Adjusted_Gross_Income

For each of the columns 2 to 5, I would like to create a new column, which has as its value, the original value, divided by the sum of the values by No_Adjusted_Gross_Income.

我首先尝试使用总和:

DF[, (vn) := as.data.table ( t( t( DF[, 2:5, by=c("No_Adjusted_Gross_Income")] )) ) ][]

但是我收到一个错误:

Error in `:=`((vn), as.data.table(t(t(DF[, 2:5, by = c("No_Adjusted_Gross_Income")])))) : 
  Check that is.data.table(DT) == TRUE. Otherwise, := and `:=`(...) are defined for use in j, once only and in particular ways. See help(":=").

如何正确执行此操作?我可以直接将原始列的值除以该总和吗?

How should do I do this properly? And can I divide the value of the original column by this sum directly?

所需总和的输出:

DF <- setDT(DF)[, sum_1 := sum(NoR_from_1_to_5000),by=c("No_Adjusted_Gross_Income")]
DF <- setDT(DF)[, sum_2 := sum(NoR_from_5000_to_10000),by=c("No_Adjusted_Gross_Income")]
DF <- setDT(DF)[, sum_3 := sum(AGI_from_1_to_5000),by=c("No_Adjusted_Gross_Income")]
DF <- setDT(DF)[, sum_4 := sum(AGI_from_5000_to_10000),by=c("No_Adjusted_Gross_Income")]
DF <- setDT(DF)[, rat_1 := NoR_from_1_to_5000/sum_1 ,by=c("No_Adjusted_Gross_Income")]
DF <- setDT(DF)[, rat_2 := NoR_from_5000_to_10000/sum_2 ,by=c("No_Adjusted_Gross_Income")]
DF <- setDT(DF)[, rat_3 := AGI_from_1_to_5000/sum_3,by=c("No_Adjusted_Gross_Income")]
DF <- setDT(DF)[, rat_4 := AGI_from_5000_to_10000/sum_4,by=c("No_Adjusted_Gross_Income")]

   No_Adjusted_Gross_Income NoR_from_1_to_5000 NoR_from_5000_to_10000 AGI_from_1_to_5000 AGI_from_5000_to_10000   sum_1   sum_2       sum_3       sum_4 rat_1 rat_2 rat_3
1:                        A            1035373                 319540         2588950187             2396550000 5307633 5145582 13271736317 38591865000  0.20 0.062  0.20
2:                        A            4272260                4826042        10682786130            36195315000 5307633 5145582 13271736317 38591865000  0.80 0.938  0.80
3:                        B            1124098                1959866         2810807049            14698995000 2159471 2279406  5399757236 17095545000  0.52 0.860  0.52
4:                        B            1035373                 319540         2588950187             2396550000 2159471 2279406  5399757236 17095545000  0.48 0.140  0.48
5:                        C            4272260                4826042        10682786130            36195315000 5396358 6785908 13493593179 50894310000  0.79 0.711  0.79
6:                        C            1124098                1959866         2810807049            14698995000 5396358 6785908 13493593179 50894310000  0.21 0.289  0.21
   rat_4
1: 0.062
2: 0.938
3: 0.860
4: 0.140
5: 0.711
6: 0.289


推荐答案

您的代码确实可以计算如果使用 setDT(DF)将 data.frame 转换为 data.table 的总和[,....]
如果您只是想要比率,那也许就是您想要的

Your code does work to calculate sum if you convert your data.frame to a data.table with setDT(DF)[,....] If you are just wanting the ratio this is maybe what you are after

setDT(DF)[,paste0("rat_",1:4) :=lapply(.SD, function (x) round(x/sum(x),3)),
.SDcols = 2:5,
by =.(No_Adjusted_Gross_Income)][]

这篇关于根据选定的列创建新列,并按组计算比率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆