data.table:如何将字符向量传递给函数get data.table以将其内容视为列名? [英] data.table: How do I pass a character vector to a function get data.table to treat its contents as column names?

查看:72
本文介绍了data.table:如何将字符向量传递给函数get data.table以将其内容视为列名?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个数据表:

  library(data.table)
DT <-数据。 table(airquality)

此示例生成我想要的输出:

  DT [,`:=`(New_Ozone = log(Ozone),New_Wind = log(Wind))] 

如何编写函数 log_those_columns ,使以下代码段输出相同的结果?

  old_names<-c( Ozone, Wind)
new_names<--c( New_Ozone, New_Wind)
log_those_columns(DT,old_names,new_names)

注意我需要 old_names new_names 足够灵活以包含任意数量的列。



(我从与此主题相关的类似StackOverflow问题中看到,答案可能涉及 .SD 的某种组合, with = F parse() eval() ,和/或 substitute(),但我似乎无法确定要使用哪个以及在哪里使用。

解决方案

拾取 MichaelChirico的评论,函数定义可以写为:

  log_those_columns<-函数(DT,cols_in,cols_new){
DT [,(cols_new):= lapply(.SD,log),.SDcols = cols_in]
}

返回:

  log_those_columns(DT ,old_names,new_names)
DT



  Ozone Solar.R风温月日New_Ozone New_Wind 
1:41190 7.4 67 5 1 3.713572 2.001480
2:36 118 8.0 72 5 2 3.583519 2.079442
3:12 149 12.6 74 5 3 2.484907 2.533697
4:18313 11.5 62 5 4 2.890372 2.442347
5:不适用不适用14.3 56 5 5不适用2.660260
---
149:30193 6.9 70 9 26 3.401197 1.931521
150:不适用145 13.2 77 9 27 NA 2.580217
151:14 191 14.3 75 9 28 2.639057 2.660260
152:18 131 8.0 76 9 29 2.890372 2.079442
153:20223 11.5 68 9 30 2.995732 2.442347



更灵活的方法


用于转换数据的函数也可以作为参数传递:

  fct_those_columns<-函数(DT,cols_in,cols_new,fct ){
DT [,(cols_new):= lapply(.SD,fct),.SDcols = col s_in]
}

通话:

  fct_those_columns(DT,old_names,new_names,log)
head(DT)

预期:


  Ozone Solar.R风温度月日New_Ozone New_Wind 
1:41190 7.4 67 5 1 3.713572 2.001480
2:36 118 8.0 72 5 2 3.583519 2.079442
3:12 149 12.6 74 5 3 2.484907 2.533697
4:18313 11.5 62 5 4 2.890372 2.442347
5 :不适用不适用14.3 56 5 5不适用2.660260
6:28不适用14.9 66 5 6 3.332205 2.701361


函数名称可以作为字符传递:

  fct_those_columns(DT,old_names,new_names, sqrt)
head(DT)



 臭氧太阳能。 R风温月日New_Ozone New_Wind 
1:41190 7.4 67 5 1 6.403124 2.72 0294
2:36118 8.0 72 5 2 6.000000 2.828427
3:12149 12.6 74 5 3 3.464102 3.549648
4:18313 11.5 62 5 4 4.242641 3.391165
5:不适用不适用14.3 56 5 5不适用3.781534
6:28不适用14.9 66 5 6 5.291503 3.860052


或作为匿名函数:

  fct_those_columns(DT,old_names,new_names,function(x)x ^(1/2)) 
head(DT)



 臭氧Solar.R风温月日New_Ozone New_Wind 
1:41190 7.4 67 5 1 6.403124 2.720294
2:36118 8.0 72 5 2 6.000000 2.828427
3:12 149 12.6 74 5 3 3.464102 3.549648
4:18313 11.5 62 5 4 4.242641 3.391165
5:不适用不适用14.3 56 5 5不适用3.781534
6:28不适用14.9 66 5 6 5.291503 3.860052



一种更加灵活的方法


下面的函数通过在输入列的名称前自动添加该函数的名称来导出新列的名称:

  fct_those_columns<-函数(DT,cols_in,fct){
fct_name<-替代(fct)
cols_new <-paste(if(class(fct_name)== name)fct_name else fct_name [3],cols_in,sep = _)
DT [,(cols_new):= lapply(.SD ,fct),.SDcols = cols_in]
}

DT<-data.table(airquality)
fct_those_columns(DT,old_names,sqrt)
fct_those_columns( DT,old_names,data.table :: as.IDate)
fct_those_columns(DT,old_names,function(x)x ^(1/2))
DT



  Ozone Solar.R风温度月份Day sqrt_Ozone sqrt_Wind as.IDate_Ozone as.IDate_Wind x ^( 1/2)_臭氧x ^(1/2)_风
1:41190 7.4 67 5 1 6.403124 2.720294 1970-02-11 1970-01-08 6.40312 4 2.720294
2:36118 8.0 72 5 2 6.000000 2.828427 1970-02-06 1970-01-09 6.000000 2.828427
3:12 149 12.6 74 5 3 3.464102 3.549648 1970-01-13 1970-01 -13 3.464102 3.549648
4:18 313 11.5 62 5 4 4.242641 3.391165 1970-01-19 1970-01-12 4.242641 3.391165
5:不适用不适用14.3 56 5 5不适用3.781534< NA> 1970-01-15 NA 3.781534
---
149:30193 6.9 70 9 26 5.477226 2.626785 1970-01-31 1970-01-07 5.477226 2.626785
150:NA 145 13.2 77 9 27 NA 3.633180< NA> 1970-01-14不适用3.633180
151:14 191 14.3 75 9 28 3.741657 3.781534 1970-01-15 1970-01-15 3.741657 3.781534
152:18131 8.0 76 9 29 4.242641 2.828427 1970-01 -19 1970-01-09 4.242641 2.828427
153:20 223 11.5 68 9 30 4.472136 3.391165 1970-01-21 1970-01-12 4.472136 3.391165


请注意, x ^(1/2)_Ozone 在R和需要放在反引号中:

  DT $`x ^(1/2)_Ozone` 


Here is a data.table:

library(data.table)
DT <- data.table(airquality)

This example produces the output I want:

DT[, `:=`(New_Ozone= log(Ozone), New_Wind=log(Wind))]

How can I write a function log_those_columns such that the following code snippet outputs the same result?

old_names <- c("Ozone", "Wind")
new_names <- c("New_Ozone", "New_Wind")
log_those_columns(DT, old_names, new_names)

Note that I need old_names and new_names to be flexible enough to contain any number of columns.

(I see from the similar StackOverflow questions on this topic that the answer probably involves some combination of .SD, with=F, parse(), eval(), and/or substitute(), but I can't seem to nail which of those to use and where).

解决方案

Picking up MichaelChirico's comment, the function definition can be written as:

log_those_columns <- function(DT, cols_in, cols_new) {
  DT[, (cols_new) := lapply(.SD, log), .SDcols = cols_in]
}

which returns:

log_those_columns(DT, old_names, new_names)
DT

     Ozone Solar.R Wind Temp Month Day New_Ozone New_Wind
  1:    41     190  7.4   67     5   1  3.713572 2.001480
  2:    36     118  8.0   72     5   2  3.583519 2.079442
  3:    12     149 12.6   74     5   3  2.484907 2.533697
  4:    18     313 11.5   62     5   4  2.890372 2.442347
  5:    NA      NA 14.3   56     5   5        NA 2.660260
 ---                                                     
149:    30     193  6.9   70     9  26  3.401197 1.931521
150:    NA     145 13.2   77     9  27        NA 2.580217
151:    14     191 14.3   75     9  28  2.639057 2.660260
152:    18     131  8.0   76     9  29  2.890372 2.079442
153:    20     223 11.5   68     9  30  2.995732 2.442347

as expected.

A more flexible approach

The function used to transform the data can be passed as a parameter as well:

fct_those_columns <- function(DT, cols_in, cols_new, fct) {
  DT[, (cols_new) := lapply(.SD, fct), .SDcols = cols_in]
}

The call:

fct_those_columns(DT, old_names, new_names, log)
head(DT)

works as expected:

   Ozone Solar.R Wind Temp Month Day New_Ozone New_Wind
1:    41     190  7.4   67     5   1  3.713572 2.001480
2:    36     118  8.0   72     5   2  3.583519 2.079442
3:    12     149 12.6   74     5   3  2.484907 2.533697
4:    18     313 11.5   62     5   4  2.890372 2.442347
5:    NA      NA 14.3   56     5   5        NA 2.660260
6:    28      NA 14.9   66     5   6  3.332205 2.701361

The function name can be passed as character:

fct_those_columns(DT, old_names, new_names, "sqrt")
head(DT)

   Ozone Solar.R Wind Temp Month Day New_Ozone New_Wind
1:    41     190  7.4   67     5   1  6.403124 2.720294
2:    36     118  8.0   72     5   2  6.000000 2.828427
3:    12     149 12.6   74     5   3  3.464102 3.549648
4:    18     313 11.5   62     5   4  4.242641 3.391165
5:    NA      NA 14.3   56     5   5        NA 3.781534
6:    28      NA 14.9   66     5   6  5.291503 3.860052

or as an anonymous function:

fct_those_columns(DT, old_names, new_names, function(x) x^(1/2))
head(DT)

   Ozone Solar.R Wind Temp Month Day New_Ozone New_Wind
1:    41     190  7.4   67     5   1  6.403124 2.720294
2:    36     118  8.0   72     5   2  6.000000 2.828427
3:    12     149 12.6   74     5   3  3.464102 3.549648
4:    18     313 11.5   62     5   4  4.242641 3.391165
5:    NA      NA 14.3   56     5   5        NA 3.781534
6:    28      NA 14.9   66     5   6  5.291503 3.860052

An even more flexible approach

The function below derives the names of the new columns by prepending the names of the input columns with the name of the function automatically:

fct_those_columns <- function(DT, cols_in, fct) {
  fct_name <- substitute(fct)
  cols_new <- paste(if (class(fct_name) == "name") fct_name else fct_name[3], cols_in, sep = "_")
  DT[, (cols_new) := lapply(.SD, fct), .SDcols = cols_in]
}

DT <- data.table(airquality)
fct_those_columns(DT, old_names, sqrt)
fct_those_columns(DT, old_names, data.table::as.IDate)
fct_those_columns(DT, old_names, function(x) x^(1/2))
DT

     Ozone Solar.R Wind Temp Month Day sqrt_Ozone sqrt_Wind as.IDate_Ozone as.IDate_Wind x^(1/2)_Ozone x^(1/2)_Wind
  1:    41     190  7.4   67     5   1   6.403124  2.720294     1970-02-11    1970-01-08      6.403124     2.720294
  2:    36     118  8.0   72     5   2   6.000000  2.828427     1970-02-06    1970-01-09      6.000000     2.828427
  3:    12     149 12.6   74     5   3   3.464102  3.549648     1970-01-13    1970-01-13      3.464102     3.549648
  4:    18     313 11.5   62     5   4   4.242641  3.391165     1970-01-19    1970-01-12      4.242641     3.391165
  5:    NA      NA 14.3   56     5   5         NA  3.781534           <NA>    1970-01-15            NA     3.781534
 ---                                                                                                               
149:    30     193  6.9   70     9  26   5.477226  2.626785     1970-01-31    1970-01-07      5.477226     2.626785
150:    NA     145 13.2   77     9  27         NA  3.633180           <NA>    1970-01-14            NA     3.633180
151:    14     191 14.3   75     9  28   3.741657  3.781534     1970-01-15    1970-01-15      3.741657     3.781534
152:    18     131  8.0   76     9  29   4.242641  2.828427     1970-01-19    1970-01-09      4.242641     2.828427
153:    20     223 11.5   68     9  30   4.472136  3.391165     1970-01-21    1970-01-12      4.472136     3.391165

Note that x^(1/2)_Ozone is not a syntactically valid name in R and needs to be put in backquotes:

DT$`x^(1/2)_Ozone`

这篇关于data.table:如何将字符向量传递给函数get data.table以将其内容视为列名?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆