R中的空白字符串不能替换为NA [英] Whitespace string can't be replaced with NA in R

查看:930
本文介绍了R中的空白字符串不能替换为NA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用NA代替空格. df[df == ""] <- NA是一种简单的方法,它适用于我数据框的大多数单元....但并非适用于所有人!

I want to substitute whitespaces with NA. A simple way could be df[df == ""] <- NA, and that works for most of the cells of my data frame....but not for everyone!

我有以下代码:

library(rvest)
library(dplyr)
library(tidyr)

#Read website
htmlpage <- read_html("http://www.soccervista.com/results-Liga_MX_Apertura-2016_2017-844815.html")

#Extract table
df <- htmlpage %>% html_nodes("table") %>% html_table()
df <- as.data.frame(df)

#Set whitespaces into NA's
df[df == ""] <- NA

我发现有些空格在引号之间有一些空格

I figured out that some whitespaces have a little whitespace between the quotation marks

df[11,1] [1] " "

df[11,1] [1] " "

所以我的解决方案是执行下一个:df[df == " "] <- NA

So my solution was to do the next: df[df == " "] <- NA

但是问题仍然存在,并且空格很小!我以为修剪功能可以工作,但是没有用...

However the problem is still there and it has the little whitespace! I thought the trim function would work but it didn't...

#Trim
df[,c(1:10)] <- sapply(df[,c(1:10)], trimws)

但是,问题无法解决.

有什么想法吗?

推荐答案

我们需要使用lapply而不是sapply,因为sapply返回的是matrix而不是list,这可能会在引号.

We need to use lapply instead of sapply as sapply returns a matrix instead of a list and this can create problems in the quotes.

df[1:10] <- lapply(df[1:10], trimws)

如果我们有" "这样的空格,另一个选择是使用gsub将这些空格替换为""

and another option if we have spaces like " " is to use gsub to replace those spaces to ""

df[1:10] <- lapply(df[,c(1:10)], function(x) gsub("^\\s+|\\s+$", "", x))

,然后将""更改为NA

df[df == ""] <- NA


或者不用做两次替换,我们可以一口气用type.convert

df[] <- lapply(df, function(x)
      type.convert(replace(x, grepl("^\\s*$", trimws(x)), NA), as.is = TRUE))

注意:当所有列都循环时,我们不必指定列索引

NOTE: We don't have to specify the column index when all the columns are looped

这篇关于R中的空白字符串不能替换为NA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆