从R数据框清除`Inf'值 [英] Cleaning `Inf` values from an R dataframe
问题描述
在R中,我有一个操作,在转换数据帧时创建一些 Inf
值。
In R, I have an operation which creates some Inf
values when I transform a dataframe.
我想将这些值转换为
NA
值(是,在这种情况下是适当的)。我有一个黑客,但它在大数据的情况下是缓慢的 - 有更多的R 这样做的方式吗?
I would like to turn these Inf
values into NA
values (yes, it is appropriate in this case). I have a hack, but it's slow in the case of large data - is there a more R way of doing this?
说我有以下数据框:
dat <- data.frame(a=c(1, Inf), b=c(Inf, 3), d=c("a","b"))
以下在单个案例中工作:
The following works in a single case:
dat[,1][is.infinite(dat[,1])] = NA
cf_DFinf2NA <- function(x)
{
for (i in 1:ncol(x)){
x[,i][is.infinite(x[,i])] = NA
}
return(x)
}
...但我不认为这真的使用R的力量。
... but i don't think that this is really using the power of R.
推荐答案
选项1
使用 data.frame
是列的列表,然后使用
do.call
重新创建一个 data.frame
。 / p>
Option 1
Use the fact that a data.frame
is a list of columns, then use do.call
to recreate a data.frame
.
do.call(data.frame,lapply(DT, function(x) replace(x, is.infinite(x),NA)))
选项2 - data.table
您可以使用 data.table
和 set
。
DT <- data.table(dat)
invisible(lapply(names(DT),function(.name) set(DT, which(is.infinite(DT[[.name]])), j = .name,value =NA)))
或使用列号(如果列有很多列,可能更快):
Or using column numbers (possibly faster if there are a lot of columns):
for (j in 1:ncol(DT)) set(DT, which(is.infinite(DT[[j]])), j, NA)
时间
Timings
# some `big(ish)` data
dat <- data.frame(a = rep(c(1,Inf), 1e6), b = rep(c(Inf,2), 1e6),
c = rep(c('a','b'),1e6),d = rep(c(1,Inf), 1e6),
e = rep(c(Inf,2), 1e6))
# create data.table
library(data.table)
DT <- data.table(dat)
# replace (@mnel)
system.time(na_dat <- do.call(data.frame,lapply(dat, function(x) replace(x, is.infinite(x),NA))))
## user system elapsed
# 0.52 0.01 0.53
# is.na (@dwin)
system.time(is.na(dat) <- sapply(dat, is.infinite))
# user system elapsed
# 32.96 0.07 33.12
# modified is.na
system.time(is.na(dat) <- do.call(cbind,lapply(dat, is.infinite)))
# user system elapsed
# 1.22 0.38 1.60
# data.table (@mnel)
system.time(invisible(lapply(names(DT),function(.name) set(DT, which(is.infinite(DT[[.name]])), j = .name,value =NA))))
# user system elapsed
# 0.29 0.02 0.31
data.table
是最快的。使用 sapply
可以明显减缓操作。
data.table
is the quickest. Using sapply
slows things down noticeably.
这篇关于从R数据框清除`Inf'值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!