将所有 0 值替换为 NA [英] Replace all 0 values to NA

查看:51
本文介绍了将所有 0 值替换为 NA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含一些数字列的数据框.某些行的值为 0,在统计分析中应将其视为空值.在 R 中将所有 0 值替换为 NULL 的最快方法是什么?

I have a dataframe with some numeric columns. Some row has a 0 value which should be considered as null in statistical analysis. What is the fastest way to replace all the 0 value to NULL in R?

推荐答案

将所有零替换为 NA:

Replacing all zeroes to NA:

df[df == 0] <- NA



说明

1. 这不是 NULL 你应该用什么来替换零.正如它在 ?'NULL' 中所说,

1. It is not NULL what you should want to replace zeroes with. As it says in ?'NULL',

NULL 代表 R 中的空对象

NULL represents the null object in R

这是独一无二的,我猜它可以被视为最缺乏信息和空洞的对象.1那么也就不足为奇了

which is unique and, I guess, can be seen as the most uninformative and empty object.1 Then it becomes not so surprising that

data.frame(x = c(1, NULL, 2))
#   x
# 1 1
# 2 2

也就是说,R 没有为这个空对象保留任何空间.2 同时,查看 ?'NA' 我们看到

That is, R does not reserve any space for this null object.2 Meanwhile, looking at ?'NA' we see that

NA 是一个长度为 1 的逻辑常量,其中包含一个缺失值指标.NA 可以被强制转换为除 raw 之外的任何其他向量类型.

NA is a logical constant of length 1 which contains a missing value indicator. NA can be coerced to any other vector type except raw.

重要的是,NA 的长度为 1,因此 R 为其保留了一些空间.例如,

Importantly, NA is of length 1 so that R reserves some space for it. E.g.,

data.frame(x = c(1, NA, 2))
#    x
# 1  1
# 2 NA
# 3  2

此外,数据框结构要求所有列具有相同数量的元素,以便不能有漏洞"(即 NULL 值).

Also, the data frame structure requires all the columns to have the same number of elements so that there can be no "holes" (i.e., NULL values).

现在您可以在数据框中用 NULL 替换零,因为完全删除包含至少一个零的所有行.当使用例如 varcovcor 时,实际上相当于首先用 NA 替换零并将use 的值设置为"complete.obs".然而,这通常并不令人满意,因为它会导致额外的信息丢失.

Now you could replace zeroes by NULL in a data frame in the sense of completely removing all the rows containing at least one zero. When using, e.g., var, cov, or cor, that is actually equivalent to first replacing zeroes with NA and setting the value of use as "complete.obs". Typically, however, this is unsatisfactory as it leads to extra information loss.

2. 在解决方案中,我使用 df == 0 向量化,而不是运行某种循环.df == 0 返回(试一试)一个与 df 大小相同的矩阵,其条目为 TRUEFALSE.此外,我们还允许将此矩阵传递给子集 [...](参见 ?'[').最后,虽然 df[df == 0] 的结果非常直观,但 df[df == 0] <- NA 给出所需的结果似乎很奇怪影响.赋值运算符 <- 确实并不总是那么聪明,并且在处理其他一些对象时不会以这种方式工作,但对数据帧却如此;见 ?'<-'.

2. Instead of running some sort of loop, in the solution I use df == 0 vectorization. df == 0 returns (try it) a matrix of the same size as df, with the entries TRUE and FALSE. Further, we are also allowed to pass this matrix to the subsetting [...] (see ?'['). Lastly, while the result of df[df == 0] is perfectly intuitive, it may seem strange that df[df == 0] <- NA gives the desired effect. The assignment operator <- is indeed not always so smart and does not work in this way with some other objects, but it does so with data frames; see ?'<-'.


1 集合论中的空集感觉有些相关.
2 与集合论的另一个相似之处:空集是每个集合的一个子集,但我们不为其保留任何空间.

1 The empty set in the set theory feels somehow related.
2 Another similarity with the set theory: the empty set is a subset of every set, but we do not reserve any space for it.

这篇关于将所有 0 值替换为 NA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆