如何在R数据帧中用NA替换空字符串? [英] How to replace empty string with NA in R dataframe?

查看:1055
本文介绍了如何在R数据帧中用NA替换空字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的第一种方法是从csv读取数据时使用na.strings="".由于某些原因,这不起作用.我也尝试过:

My first approach was to use na.strings="" when I read the data in from a csv. This doesn't work for some reason. I also tried:

df[df==''] <- NA

给我一​​个错误:无法使用矩阵或数组进行列索引.

Which gave me an error: Can't use matrix or array for column indexing.

我只尝试了以下列:

df$col[df$col==''] <- NA

这会将整个数据框中的每个值转换为NA,即使除了空字符串之外还有其他值.

This converts every value in the entire dataframe to NA, even though there are values besides empty strings.

然后我尝试使用mutate_all:

replace.empty <- function(a) {
    a[a==""] <- NA
}

#dplyr pipe
df %>% mutate_all(funs(replace.empty))

这还会将整个数据框中的每个值转换为NA.

This also converts every value in the entire dataframe to NA.

我怀疑我的空"字符串有些奇怪,因为第一种方法没有效果,但是我不知道是什么.

I suspect something is weird about my "empty" strings since the first method had no effect but I can't figure out what.

编辑(应MKR的要求) dput(head(df))的输出:

EDIT (at request of MKR) Output of dput(head(df)):

structure(c("function (x, df1, df2, ncp, log = FALSE) ", "{",
"    if (missing(ncp)) ", "        .Call(C_df, x, df1, df2, log)",
"    else .Call(C_dnf, x, df1, df2, ncp, log)", "}"), .Dim = c(6L,
1L), .Dimnames = list(c("1", "2", "3", "4", "5", "6"), ""), class = 
"noquote")

推荐答案

我不确定df[df==""]<-NA为什么不能用于OP.让我们来一个样本data.frame并研究选项.

I'm not sure why df[df==""]<-NA would have not worked for OP. Let's take a sample data.frame and investigate options.

选项1: Base-R

df[df==""]<-NA

df
#    One  Two Three Four
# 1    A    A  <NA>  AAA
# 2 <NA>    B    BA <NA>
# 3    C <NA>    CC  CCC

选项#2: dplyr::mutate_allna_if.或mutate_if如果数据框具有多种类型的列

Option#2: dplyr::mutate_all and na_if. Or mutate_if if data frame got multiple types of columns

library(dplyr)

mutate_all(df, list(~na_if(.,"")))

#if data frame other types of character Then
df %>% mutate_if(is.character, list(~na_if(.,""))) 

#    One  Two Three Four
# 1    A    A  <NA>  AAA
# 2 <NA>    B    BA <NA>
# 3    C <NA>    CC  CCC

玩具数据:

df <- data.frame(One=c("A","","C"), 
                 Two=c("A","B",""), 
                 Three=c("","BA","CC"), 
                 Four=c("AAA","","CCC"), 
                 stringsAsFactors = FALSE)

df
#   One Two Three Four
# 1   A   A        AAA
# 2       B    BA     
# 3   C        CC  CCC

这篇关于如何在R数据帧中用NA替换空字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆