在读取.csv时,用NA替换缺失值的最佳方法是什么? [英] What's the best way to replace missing values with NA when reading in a .csv?
问题描述
我有一个.csv数据集,其中包含许多缺失值,并且我希望R读取表时以相同的方式(正确"方式)识别它们.我一直在使用:
I have a .csv dataset with many missing values, and I'd like R to recognize them all the same way (the "correct" way) when I read the table in. I've been using:
import = read.csv("/Users/dataset.csv",
header =T, na.strings=c(""))
此脚本用某些东西填充了所有空单元格,但不一致.当我用head(import)
查看数据时,某些缺少的单元格用<NA>
填充,而某些缺少的单元格用NA
填充.我担心R在开始分析数据集时会以不同的方式对待识别缺失值的这两种方式,因此我想让导入统一读取那些缺失值.
This script fills all the empty cells with something, but it's not consistant. When I look at the data with head(import)
, some missing cells are filled with <NA>
and some missing cells are filled with NA
. I fear that R treats these two ways of identifying missing values differently when start analyzing the dataset, so I'd like to have the import uniformly read in those missing values.
最后,我的csv文件中的某些缺失值仅用句点表示.我还希望在导入R时以正确的缺失值表示法来表示这些时间段.
Finally, some of the missing values in my csv file are represented with a period only. I would also like those periods to be represented by the correct missing value notation when I import to R.
推荐答案
<NA>
与NA
仅仅意味着您的某些列是字符,而有些则是数字,仅此而已.绝对没有错.
The <NA>
vs NA
just means that some of your columns are character and some are numeric, that's all. Absolutely nothing is wrong with that.
如上所述,如果csv中的某些缺失值用单个句点.
表示,那么您可以通过以下方式指定应视为NA
的值的向量:
As Ben mentioned above, if some of your missing values in the csv are represented by a single period, .
, then you can specify a vector of values that should be treated as NA
s via:
na.strings=c("",".","NA")
作为read.csv
的参数.
这篇关于在读取.csv时,用NA替换缺失值的最佳方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!