在数据帧的所选列中包含NA(缺失)值的行子集 [英] Subset of rows containing NA (missing) values in a chosen column of a data frame
问题描述
我们有一个来自CSV文件的数据框。数据框 DF
具有包含观察值的列和包含测量日期的列( VaR2
)已采取。如果未记录日期,则CSV文件包含缺少数据的值 NA
。
Var1 Var2
10 2010/01/01
20 NA
30 2010/03/01
我们想使用subset命令定义一个新的数据框架 new_DF
,这样它只包含有 NA'
的值( VaR2
)。在给出的示例中,只有第2行将包含在新 DF
中。
命令
new_DF <-subset(DF,DF $ Var2 ==NA)
$
如果在原始CSV文件中,用
NULL
交换值NA
,同一命令产生期望的结果:new_DF <子集(DF,DF $ Var2 ==NULL)
。
如果对于字符串,原始CSV中提供了
NA
文件?解决方案不要使用=='NA'来测试缺失的值。请改用
is.na()
。这应该可以:new_DF < - DF [rowSums(is.na(DF))> 0,]
或者如果要检查特定列,还可以使用
new_DF< - DF [is.na(DF $ Var),]
pre>
如果您有NA字符值,请先运行
Df [Df =='NA'] < - NA
/ p>
We have a data frame from a CSV file. The data frame
DF
has columns that contain observed values and a column (VaR2
) that contains the date at which a measurement has been taken. If the date was not recorded, the CSV file contains the valueNA
, for missing data.Var1 Var2 10 2010/01/01 20 NA 30 2010/03/01
We would like to use the subset command to define a new data frame
new_DF
such that it only contains rows that have anNA'
value from the column (VaR2
). In the example given, only Row 2 will be contained in the newDF
.The command
new_DF<-subset(DF,DF$Var2=="NA")
does not work, the resulting data frame has no row entries.
If in the original CSV file the Value
NA
are exchanged withNULL
, the same command produces the desired result:new_DF<-subset(DF,DF$Var2=="NULL")
.How can I get this method working, if for the character string the value
NA
is provided in the original CSV file?解决方案Never use =='NA' to test for missing values. Use
is.na()
instead. This should do it:new_DF <- DF[rowSums(is.na(DF)) > 0,]
or in case you want to check a particular column, you can also use
new_DF <- DF[is.na(DF$Var),]
In case you have NA character values, first run
Df[Df=='NA'] <- NA
to replace them with missing values.
这篇关于在数据帧的所选列中包含NA(缺失)值的行子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!