在读取.csv时,用NA替换缺失值的最佳方法是什么? [英] What's the best way to replace missing values with NA when reading in a .csv?

查看:559
本文介绍了在读取.csv时,用NA替换缺失值的最佳方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个.csv数据集,其中包含许多缺失值,并且我希望R读取表时以相同的方式(正确"方式)识别它们.我一直在使用:

I have a .csv dataset with many missing values, and I'd like R to recognize them all the same way (the "correct" way) when I read the table in. I've been using:

import = read.csv("/Users/dataset.csv", 
                  header =T, na.strings=c(""))

此脚本用某些东西填充了所有空单元格,但不一致.当我用head(import)查看数据时,某些缺少的单元格用<NA>填充,而某些缺少的单元格用NA填充.我担心R在开始分析数据集时会以不同的方式对待识别缺失值的这两种方式,因此我想让导入统一读取那些缺失值.

This script fills all the empty cells with something, but it's not consistant. When I look at the data with head(import), some missing cells are filled with <NA> and some missing cells are filled with NA. I fear that R treats these two ways of identifying missing values differently when start analyzing the dataset, so I'd like to have the import uniformly read in those missing values.

最后,我的csv文件中的某些缺失值仅用句点表示.我还希望在导入R时以正确的缺失值表示法来表示这些时间段.

Finally, some of the missing values in my csv file are represented with a period only. I would also like those periods to be represented by the correct missing value notation when I import to R.

推荐答案

<NA>NA仅仅意味着您的某些列是字符,而有些则是数字,仅此而已.绝对没有错.

The <NA> vs NA just means that some of your columns are character and some are numeric, that's all. Absolutely nothing is wrong with that.

如上所述,如果csv中的某些缺失值用单个句点.表示,那么您可以通过以下方式指定应视为NA的值的向量:

As Ben mentioned above, if some of your missing values in the csv are represented by a single period, ., then you can specify a vector of values that should be treated as NAs via:

na.strings=c("",".","NA")

作为read.csv的参数.

这篇关于在读取.csv时,用NA替换缺失值的最佳方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆