R data.table-将函数A应用于某些列，将函数B应用于其他列 [英] R data.table - Apply function A to some columns and function B to some others

查看：112 发布时间：2020/10/15 19:20:00 r data.table

本文介绍了R data.table-将函数A应用于某些列，将函数B应用于其他列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想聚合数据表的行，但是聚集功能取决于列的名称。

I want to aggregate datatable's row, but the aggragation function depends on the name of the column.

例如，如果列名称为：

variable1 或 variable2 ，然后应用 mean（）函数。

variable3 ，然后应用 max（）函数。

变量4 ，然后应用 sd（）函数。

variable1 or variable2, then apply the mean() function.
variable3, then apply the max() function.
variable4, then apply the sd() function.

我的数据表始终具有 datetime 列：我想按时间汇总行。
但是，数据列的数量可以变化。

My datatables always have a datetime column: I want to aggregate rows by time. However, the number of "data" column can vary.

我知道如何使用相同的聚合函数（例如所有列的均值（））：

I know how to do that with the same aggregation function (e.g. mean()) for all columns:

dt <- dt[, lapply(.SD, mean),
           by = .(datetime = floor_date(datetime, timeStep))]

或仅针对列的子集：

cols <- c("variable1", "variable2")    
dt <- dt[ ,(cols) := lapply(.SD, mean), 
            by = .(datetime = floor_date(datetime, timeStep)),
            .SDcols = cols]

我想做的事情是：

colsToMean <- c("variable1", "variable2") 
colsToMax <- c("variable3")   
colsToSd <- c("variable4")   
dt <- dt[ ,{(colsToMean) := lapply(.SD???, mean),
             (colsToMax) := lapply(.SD???, max),
             (colsToSd) :=  lapply(.SD???, sd)}, 
            by = .(datetime = floor_date(datetime, timeStep)),
            .SDcols = (colsToMean, colsToMax, colsToSd)]

我查看了数据。 R中的表格-将多个函数应用于多个列，这使我有了使用自定义函数的想法：

I looked at data.table in R - apply multiple functions to multiple columns which gave me the idea to use a custom function:

myAggregate <- function(x, columnName) {
   FUN = getAggregateFunction(columnName) # Return mean() or max() or sd()
   return FUN(x)
}
dt <- dt[, lapply(.SD, myAggregate, ???columName???),
           by = .(datetime = floor_date(datetime, timeStep))]

但是我不知道如何将当前列名传递给 myAggregate（） ...

But I don't know how to pass the current column name to myAggregate()...

推荐答案

这是使用 Map 或 mapply ：


让我们先制作一些玩具数据：
Let's make some toy data first:
dt <- data.table(
    variable1 = rnorm(100),
    variable2 = rnorm(100),
    variable3 = rnorm(100),
    variable4 = rnorm(100),
    grp = sample(letters[1:5], 100, replace = T)
)

colsToMean <- c("variable1", "variable2") 
colsToMax <- c("variable3")   
colsToSd <- c("variable4")

然后，
scols <- list(colsToMean, colsToMax, colsToSd)
funs <- rep(c(mean, max, sd), lengths(scols))

# summary
dt[, Map(function(f, x) f(x), funs, .SD), by = grp, .SDcols = unlist(scols)]

# or replace the original values with summary statistics as in OP
dt[, unlist(scols) := Map(function(f, x) f(x), funs, .SD), by = grp, .SDcols = unlist(scols)]

 GForce的另一种选择是：
Another option with GForce on:
scols <- list(colsToMean, colsToMax, colsToSd)
funs <- rep(c('mean', 'max', 'sd'), lengths(scols))

jexp <- paste0('list(', paste0(funs, '(', unlist(scols), ')', collapse = ', '), ')')
dt[, eval(parse(text = jexp)), by = grp, verbose = TRUE]

# Detected that j uses these columns: variable1,variable2,variable3,variable4 
# Finding groups using forderv ... 0.000sec 
# Finding group sizes from the positions (can be avoided to save RAM) ... 0.000sec 
# Getting back original order ... 0.000sec 
# lapply optimization is on, j unchanged as 'list(mean(variable1), mean(variable2), max(variable3), sd(variable4))'
# GForce optimized j to 'list(gmean(variable1), gmean(variable2), gmax(variable3), gsd(variable4))'
# Making each group and running j (GForce TRUE) ... 0.000sec 


                        这篇关于R data.table-将函数A应用于某些列，将函数B应用于其他列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

R data.table-将函数A应用于某些列，将函数B应用于其他列 [英] R data.table - Apply function A to some columns and function B to some others

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R data.table-将函数A应用于某些列，将函数B应用于其他列 [英] R data.table - Apply function A to some columns and function B to some others

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭