从 R 数据框中清除“Inf"值 [英] Cleaning `Inf` values from an R dataframe

查看:46
本文介绍了从 R 数据框中清除“Inf"值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 R 中,我有一个操作在转换数据帧时创建一些 Inf 值.

In R, I have an operation which creates some Inf values when I transform a dataframe.

我想将这些 Inf 值转换为 NA 值.我的代码对于大数据来说很慢,有没有更快的方法呢?

I would like to turn these Inf values into NA values. The code I have is slow for large data, is there a faster way of doing this?

假设我有以下数据框:

dat <- data.frame(a=c(1, Inf), b=c(Inf, 3), d=c("a","b"))

以下情况适用于单个案例:

The following works in a single case:

 dat[,1][is.infinite(dat[,1])] = NA

所以我用下面的循环概括了它

So I generalized it with following loop

cf_DFinf2NA <- function(x)
{
    for (i in 1:ncol(x)){
          x[,i][is.infinite(x[,i])] = NA
    }
    return(x)
}

但我不认为这真的是在使用 R 的力量.

But I don't think that this is really using the power of R.

推荐答案

选项1

使用 data.frame 是列列表这一事实,然后使用 do.call 重新创建 data.frame.

Option 1

Use the fact that a data.frame is a list of columns, then use do.call to recreate a data.frame.

do.call(data.frame,lapply(DT, function(x) replace(x, is.infinite(x),NA)))

选项 2 -- data.table

您可以使用 data.tableset.这样可以避免一些内部复制.

Option 2 -- data.table

You could use data.table and set. This avoids some internal copying.

DT <- data.table(dat)
invisible(lapply(names(DT),function(.name) set(DT, which(is.infinite(DT[[.name]])), j = .name,value =NA)))

或者使用列号(如果有很多列可能会更快):

Or using column numbers (possibly faster if there are a lot of columns):

for (j in 1:ncol(DT)) set(DT, which(is.infinite(DT[[j]])), j, NA)

时间

# some `big(ish)` data
dat <- data.frame(a = rep(c(1,Inf), 1e6), b = rep(c(Inf,2), 1e6), 
                  c = rep(c('a','b'),1e6),d = rep(c(1,Inf), 1e6),  
                  e = rep(c(Inf,2), 1e6))
# create data.table
library(data.table)
DT <- data.table(dat)

# replace (@mnel)
system.time(na_dat <- do.call(data.frame,lapply(dat, function(x) replace(x, is.infinite(x),NA))))
## user  system elapsed 
#  0.52    0.01    0.53 

# is.na (@dwin)
system.time(is.na(dat) <- sapply(dat, is.infinite))
# user  system elapsed 
# 32.96    0.07   33.12 

# modified is.na
system.time(is.na(dat) <- do.call(cbind,lapply(dat, is.infinite)))
#  user  system elapsed 
# 1.22    0.38    1.60 


# data.table (@mnel)
system.time(invisible(lapply(names(DT),function(.name) set(DT, which(is.infinite(DT[[.name]])), j = .name,value =NA))))
# user  system elapsed 
# 0.29    0.02    0.31 

data.table 是最快的.使用 sapply 会明显减慢速度.

data.table is the quickest. Using sapply slows things down noticeably.

这篇关于从 R 数据框中清除“Inf"值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆