< NA>有什么区别?和NA? [英] What is the difference between <NA> and NA?
问题描述
我有一个名为SMOKE的因子,其级别为"Y"和"N".缺失值被替换为NA(从初始级别"NULL"开始).但是,当我查看该因素时,会得到如下信息:
I have a factor named SMOKE with levels "Y" and "N". Missing values were replaced with NA (from the initial level "NULL"). However when I view the factor I get something like this:
head(SMOKE)
N N <NA> Y Y N
Levels: Y N
为什么R将NA
显示为<NA>
?有区别吗?
Why is R displaying NA
as <NA>
? And is there a difference?
推荐答案
在处理factors
时,当NA
用尖括号(<NA>
)包裹时,表明它实际上是NA .
When you are dealing with factors
, when the NA
is wrapped in angled brackets ( <NA>
), that indicates thtat it is in fact NA.
当它是NA
不带括号时,则它不是 NA NA,而是标签为"NA"
When it is NA
without brackets, then it is not NA, but rather a proper factor whose label is "NA"
# Note a 'real' NA and a string with the word "NA"
x <- factor(c("hello", NA, "world", "NA"))
x
[1] hello <NA> world NA
Levels: hello NA world <~~ The string appears as a level, the actual NA does not.
as.numeric(x)
[1] 1 NA 3 2 <~~ The string has a numeric value (here, 2, alphabetically)
The NA's numeric value is just NA
编辑以回答@Arun的问题:
R
只是试图区分其值为两个字母"NA"
的字符串和实际的缺失值NA
因此,显示df
与df$y
时看到的差异.示例:
Edit to answer @Arun's question:
R
is simply trying to distinguish between a string whose value are the two letters "NA"
and an actual missing value, NA
Thus the difference you see when displaying df
versus df$y
. Example:
df <- data.frame(x=1:4, y=c("a", NA_character_, "c", "NA"), stringsAsFactors=FALSE)
请注意NA的两种不同样式:
Note the two different styles of NA:
> df
x y
1 1 a
2 2 <NA>
3 3 c
4 4 NA
但是,如果我们只看'df $ y'
However, if we look at just 'df$y'
[1] "a" NA "c" "NA"
但是,如果我们删除引号(类似于在控制台上打印data.frame时看到的内容):
But, if we remove the quotation marks (similar to what we see when printing a data.frame to the console):
print(df$y, quote=FALSE)
[1] a <NA> c NA
因此,我们通过尖括号再次区分了NA
.
And thus, we once again have the distinction of NA
via the angled brackets.
这篇关于< NA>有什么区别?和NA?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!