data.table立即计算几列 [英] data.table computing several column at once
问题描述
感谢您提前阅读。我有一个函数,在data.table 1.9.3工作很好。但是今天我更新了我的data.table包,我的函数不工作。
这是我的函数和工作示例data.table 1.9.3:
trait.by< - function(data,traits =,cross.by){
traits = intersect(traits,名称(数据))
if(length(traits)<1){
#if名称和traits之间没有交集
return(data [,list(N。= .N ),by = cross.by])
} else {
return(data [,c(N. = .N,
MEAN = lapply(.SD,function(x){return (s(x,na.rm = T),digits = 1))}),
SD = lapply T),digits = 2))}),
'NA'= lapply(.SD,function(x){return(sum(is.na(x)))}) = cross.by,.SDcols = traits])
}
}
> trait.by(data.table(iris),traits = c(Sepal.Length,Sepal.Width),cross.by =Species)
#Species N. MEAN.Sepal.Length MEAN .Sepal.Width SD.Sepal.Length
#1:setosa 50 5.0 3.4 0.35
#2:versicolor 50 5.9 2.8 0.52
#3:virginica 50 6.6 3.0 0.64
# SD.Sepal.Width NA.Sepal.Length NA.Sepal.Width
#1:0.38 0 0
#2:0.31 0 0
#3:0.32 0 0
点 MEAN。(traits)
,<$ c $对于在中给出的所有列计算c> SD。(traits)
和 NA。(traits)
/ code>变量。
当我使用data.table 1.9.4运行时我收到以下错误:
> trait.by(data.table(iris),traits = c(Sepal.Length,Sepal.Width),cross.by =Species)
# ,eval(fun,SDenv,SDenv),SDenv):
#无法更改'..FUN'的锁定绑定的值
任何想法如何解决这个问题?
strong>已在 1.9.5 中修改 commit 1680 。从新闻:
- 修复了
j-expression
的内部优化中的错误lapply(.SD,function(..)..)
,如图所示这里SO SO。关闭#985。感谢@jadaliha的报告和@BrodieG的SO调试。
工作原理:
data [,
c(
MEAN = lapply(.SD,function ){return(round(mean(x,na.rm = T),digits = 1))}),
SD = lapply(.SD,function(x){return .rm = T),digits = 2))})
),by = cross.by,.SDcols = traits]
这看起来像是多次使用 lapply(.SD,FUN)的结果的错误
在一个 data.table
中结合 c(
code>(。
code> traits< - c(Sepal.Length,Sepal.Width)
cross.by< - Species
data< - data.table(iris)
data [,
c(
MEAN = lapply(.SD,function(x){return(round(mean(x,na.rm = T),digits = 1 ))})
),
by = cross.by,.SDcols = traits
]
$ b b
有效。
data [,
c(
SD = lapply .SD,function(x){return(round(sd(x,na.rm = T),digits = 2))})
),
by = cross.by,.SDcols = traits
]
有效。
data [,
c(
MEAN = lapply(.SD,function(x){return(round(mean(x,na.rm = T),digits = 1))}),
SD = lapply(.SD,function(x){return(round(sd(x,na.rm = T),digits = 2) ,
by = cross.by,.SDcols = traits
]
工作
data [,
。(
MEAN = lapply(.SD,function ){return(round(mean(x,na.rm = T),digits = 1))}),
SD = lapply(.SD,function(x){return .rm = T),digits = 2))})
),
by = cross.by,.SDcols = traits
]
工程。
Thank you in advance for reading this. I have a function which was working just fine on data.table 1.9.3. But today I updated my data.table package and my function does not work.
Here is my function and working example on data.table 1.9.3:
trait.by <- function(data,traits="",cross.by){
traits = intersect(traits,names(data))
if(length(traits)<1){
#if there is no intersect between names and traits
return( data[, list(N. = .N), by=cross.by])
}else{
return(data[,c( N. = .N,
MEAN = lapply(.SD,function(x){return(round(mean(x,na.rm=T),digits=1))}) ,
SD = lapply(.SD,function(x){return(round(sd (x,na.rm=T),digits=2))}) ,
'NA' = lapply(.SD,function(x){return(sum (is.na(x)))})),
by=cross.by, .SDcols = traits])
}
}
> trait.by(data.table(iris),traits = c("Sepal.Length", "Sepal.Width"),cross.by="Species")
# Species N. MEAN.Sepal.Length MEAN.Sepal.Width SD.Sepal.Length
#1: setosa 50 5.0 3.4 0.35
#2: versicolor 50 5.9 2.8 0.52
#3: virginica 50 6.6 3.0 0.64
# SD.Sepal.Width NA.Sepal.Length NA.Sepal.Width
#1: 0.38 0 0
#2: 0.31 0 0
#3: 0.32 0 0
The point is MEAN.(traits)
, SD.(traits)
and NA.(traits)
are computed for all columns that I give in traits
variable.
When I run this with data.table 1.9.4 I receive the following error:
> trait.by(data.table(iris),traits = c("Sepal.Length", "Sepal.Width"),cross.by="Species")
#Error in assign("..FUN", eval(fun, SDenv, SDenv), SDenv) :
# cannot change value of locked binding for '..FUN'
Any idea how I should fix this?!
Update: This has been fixed now in 1.9.5 in commit 1680. From NEWS:
- Fixed a bug in the internal optimisation of
j-expression
with more than onelapply(.SD, function(..) ..)
as illustrated here on SO. Closes #985. Thanks to @jadaliha for the report and to @BrodieG for the debugging on SO.
Now this works as expected:
data[,
c(
MEAN = lapply(.SD,function(x){return(round(mean(x,na.rm=T),digits=1))}),
SD = lapply(.SD,function(x){return(round(sd (x,na.rm=T),digits=2))})
), by=cross.by, .SDcols = traits]
This looks like a bug that manifests as a result of multiple uses of lapply(.SD, FUN)
in one data.table
call in combination with c(
. You can work around it by replacing c(
with .(
.
traits <- c("Sepal.Length", "Sepal.Width")
cross.by <- "Species"
data <- data.table(iris)
data[,
c(
MEAN = lapply(.SD,function(x){return(round(mean(x,na.rm=T),digits=1))})
),
by=cross.by, .SDcols = traits
]
Works.
data[,
c(
SD = lapply(.SD,function(x){return(round(sd (x,na.rm=T),digits=2))})
),
by=cross.by, .SDcols = traits
]
Works.
data[,
c(
MEAN = lapply(.SD,function(x){return(round(mean(x,na.rm=T),digits=1))}),
SD = lapply(.SD,function(x){return(round(sd (x,na.rm=T),digits=2))})
),
by=cross.by, .SDcols = traits
]
Doesn't work
data[,
.(
MEAN = lapply(.SD,function(x){return(round(mean(x,na.rm=T),digits=1))}),
SD = lapply(.SD,function(x){return(round(sd (x,na.rm=T),digits=2))})
),
by=cross.by, .SDcols = traits
]
Works.
这篇关于data.table立即计算几列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!