data.table立即计算几列 [英] data.table computing several column at once

查看:245
本文介绍了data.table立即计算几列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

感谢您提前阅读。我有一个函数,在data.table 1.9.3工作很好。但是今天我更新了我的data.table包,我的函数不工作。



这是我的函数和工作示例data.table 1.9.3:

  trait.by<  -  function(data,traits =,cross.by){
traits = intersect(traits,名称(数据))
if(length(traits)<1){
#if名称和traits之间没有交集
return(data [,list(N。= .N ),by = cross.by])
} else {
return(data [,c(N. = .N,
MEAN = lapply(.SD,function(x){return (s(x,na.rm = T),digits = 1))}),
SD = lapply T),digits = 2))}),
'NA'= lapply(.SD,function(x){return(sum(is.na(x)))}) = cross.by,.SDcols = traits])
}
}

> trait.by(data.table(iris),traits = c(Sepal.Length,Sepal.Width),cross.by =Species)
#Species N. MEAN.Sepal.Length MEAN .Sepal.Width SD.Sepal.Length
#1:setosa 50 5.0 3.4 0.35
#2:versicolor 50 5.9 2.8 0.52
#3:virginica 50 6.6 3.0 0.64
# SD.Sepal.Width NA.Sepal.Length NA.Sepal.Width
#1:0.38 0 0
#2:0.31 0 0
#3:0.32 0 0

MEAN。(traits),<$ c $对于在中给出的所有列计算c> SD。(traits) NA。(traits) / code>变量。






当我使用data.table 1.9.4运行时我收到以下错误:

 > trait.by(data.table(iris),traits = c(Sepal.Length,Sepal.Width),cross.by =Species)
# ,eval(fun,SDenv,SDenv),SDenv):
#无法更改'..FUN'的锁定绑定的值

任何想法如何解决这个问题?

解决方案

strong>已在 1.9.5 中修改 commit 1680 。从新闻



  1. 修复了 j-expression 的内部优化中的错误 lapply(.SD,function(..)..),如图所示这里SO SO。关闭#985。感谢@jadaliha的报告和@BrodieG的SO调试。


工作原理:

  data [,
c(
MEAN = lapply(.SD,function ){return(round(mean(x,na.rm = T),digits = 1))}),
SD = lapply(.SD,function(x){return .rm = T),digits = 2))})
),by = cross.by,.SDcols = traits]






这看起来像是多次使用 lapply(.SD,FUN)的结果的错误在一个 data.table 中结合 c( code>(。

  code> traits<  -  c(Sepal.Length,Sepal.Width)
cross.by< - Species
data< - data.table(iris)

data [,
c(
MEAN = lapply(.SD,function(x){return(round(mean(x,na.rm = T),digits = 1 ))})
),
by = cross.by,.SDcols = traits
]

$ b b

有效。

  data [,
c(
SD = lapply .SD,function(x){return(round(sd(x,na.rm = T),digits = 2))})
),
by = cross.by,.SDcols = traits
]

有效。

  data [,
c(
MEAN = lapply(.SD,function(x){return(round(mean(x,na.rm = T),digits = 1))}),
SD = lapply(.SD,function(x){return(round(sd(x,na.rm = T),digits = 2) ,
by = cross.by,.SDcols = traits
]

工作

  data [,
。(
MEAN = lapply(.SD,function ){return(round(mean(x,na.rm = T),digits = 1))}),
SD = lapply(.SD,function(x){return .rm = T),digits = 2))})
),
by = cross.by,.SDcols = traits
]

工程。


Thank you in advance for reading this. I have a function which was working just fine on data.table 1.9.3. But today I updated my data.table package and my function does not work.

Here is my function and working example on data.table 1.9.3:

trait.by <- function(data,traits="",cross.by){
  traits = intersect(traits,names(data))
  if(length(traits)<1){  
    #if there is no intersect between names and traits
    return(      data[,       list(N. = .N),    by=cross.by])
  }else{
    return(data[,c(   N. = .N,
                    MEAN = lapply(.SD,function(x){return(round(mean(x,na.rm=T),digits=1))}) , 
                    SD   = lapply(.SD,function(x){return(round(sd  (x,na.rm=T),digits=2))}) ,
                    'NA' = lapply(.SD,function(x){return(sum  (is.na(x)))})),
                 by=cross.by, .SDcols = traits])
  }
}

> trait.by(data.table(iris),traits = c("Sepal.Length",    "Sepal.Width"),cross.by="Species")
#      Species N. MEAN.Sepal.Length MEAN.Sepal.Width SD.Sepal.Length
#1:     setosa 50               5.0              3.4            0.35
#2: versicolor 50               5.9              2.8            0.52
#3:  virginica 50               6.6              3.0            0.64
#   SD.Sepal.Width NA.Sepal.Length NA.Sepal.Width
#1:           0.38               0              0
#2:           0.31               0              0
#3:           0.32               0              0

The point is MEAN.(traits), SD.(traits) and NA.(traits) are computed for all columns that I give in traits variable.


When I run this with data.table 1.9.4 I receive the following error:

> trait.by(data.table(iris),traits = c("Sepal.Length",    "Sepal.Width"),cross.by="Species")
#Error in assign("..FUN", eval(fun, SDenv, SDenv), SDenv) : 
#  cannot change value of locked binding for '..FUN'

Any idea how I should fix this?!

解决方案

Update: This has been fixed now in 1.9.5 in commit 1680. From NEWS:

  1. Fixed a bug in the internal optimisation of j-expression with more than one lapply(.SD, function(..) ..) as illustrated here on SO. Closes #985. Thanks to @jadaliha for the report and to @BrodieG for the debugging on SO.

Now this works as expected:

data[,
  c(
    MEAN = lapply(.SD,function(x){return(round(mean(x,na.rm=T),digits=1))}),
    SD = lapply(.SD,function(x){return(round(sd  (x,na.rm=T),digits=2))})
  ), by=cross.by, .SDcols = traits]    


This looks like a bug that manifests as a result of multiple uses of lapply(.SD, FUN) in one data.table call in combination with c(. You can work around it by replacing c( with .(.

traits <- c("Sepal.Length",    "Sepal.Width")
cross.by <- "Species"
data <- data.table(iris)

data[,
  c(
    MEAN = lapply(.SD,function(x){return(round(mean(x,na.rm=T),digits=1))})
  ),
  by=cross.by, .SDcols = traits
]

Works.

data[,
  c(
    SD = lapply(.SD,function(x){return(round(sd  (x,na.rm=T),digits=2))})
  ),
  by=cross.by, .SDcols = traits
]

Works.

data[,
  c(
    MEAN = lapply(.SD,function(x){return(round(mean(x,na.rm=T),digits=1))}),
    SD = lapply(.SD,function(x){return(round(sd  (x,na.rm=T),digits=2))})
  ),
  by=cross.by, .SDcols = traits
]    

Doesn't work

data[,
  .(
    MEAN = lapply(.SD,function(x){return(round(mean(x,na.rm=T),digits=1))}),
    SD = lapply(.SD,function(x){return(round(sd  (x,na.rm=T),digits=2))})
  ),
  by=cross.by, .SDcols = traits
]

Works.

这篇关于data.table立即计算几列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆