R data.table用标准名称创建新列 [英] R data.table create new columns with standard names

查看:103
本文介绍了R data.table用标准名称创建新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想基于比率计算为data.table创建新列。我的变量名称只是标准名称,因此我认为必须在 data.table 中轻松实现这一目标。但是,我不知道如何实现这一目标。以下是我的示例数据和代码-

I wanted to create new columns for my data.table based on ratio calculation. The names of my variables are slightly in a standard way so I think there must be a way to easily achieve this in data.table. However I am not able to get how to achieve this. Below is my sample data and code -

set.seed(1200)

ID <- seq(1001,1100)
region <- sample(1:10,100,replace = T)
Q21 <- sample(1:5,100,replace = T)
Q22 <- sample(1:15,100,replace = T)
Q24_LOC_1 <- sample(1:8,100,replace = T)
Q24_LOC_2 <- sample(1:8,100,replace = T)
Q24_LOC_3 <- sample(1:8,100,replace = T)
Q24_LOC_4 <- sample(1:8,100,replace = T)

Q21_PAN <- sample(1:5,100,replace = T)
Q22_PAN <- sample(1:15,100,replace = T)
Q24_LOC_1_PAN <- sample(1:8,100,replace = T)
Q24_LOC_2_PAN <- sample(1:8,100,replace = T)
Q24_LOC_3_PAN <- sample(1:8,100,replace = T)
Q24_LOC_4_PAN <- sample(1:8,100,replace = T)

df1 <- as.data.table(data.frame(ID,region,Q21,Q22,Q24_LOC_1,Q24_LOC_2,Q24_LOC_3,Q24_LOC_4,Q21_PAN,Q22_PAN,Q24_LOC_1_PAN,Q24_LOC_2_PAN,Q24_LOC_3_PAN,Q24_LOC_4_PAN))

col_needed <- c("Q21","Q22","Q24_LOC_1","Q24_LOC_2","Q24_LOC_3","Q24_LOC_4")

check1 <- df1[,Q21_R := mean(Q21,na.rm = T)/mean(Q21_PAN,na.rm = T),by=region]

check1适用于一个变量。我一直在寻找一种解决方案,在该解决方案中,我可以传递所有需要的变量,并在一行中获得新的变量。因此,在这种情况下,类似于传递 col_needed 。我也尝试下面的代码-

check1 works for one variable. I was looking for a solution where I can pass all needed variables and get the new variables calculated in a single line. So in this case something like passing col_needed. I tried below code as well -

check2 <- df1[,`:=`(paste0(col_needed,"_R"),(mean(col_needed,na.rm = T)/mean(paste0(col_needed,"_PAN"),na.rm = T))),by=region][]

但是,这会给我多个警告,结果是所有NA都存在。警告为-平均值(col_needed,na.rm = T):参数不是数字或逻辑:返回NA

However this gives me multiple warnings and the result is having all NAs. The warnings are - In mean(col_needed, na.rm = T) : argument is not numeric or logical: returning NA

可以吗?

推荐答案

如果我正确理解,则可以执行以下操作:

If I understand correctly, you could do the following:

df1[, paste(col_needed, "R", sep = "_") := 
      Map(function(x,y) mean(get(x), na.rm = TRUE)/mean(get(y), na.rm=TRUE), 
           col_needed, 
           paste(col_needed, "PAN", sep = "_")),
    by=region]

这篇关于R data.table用标准名称创建新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆