R中的空白字符串不能替换为NA [英] Whitespace string can't be replaced with NA in R
问题描述
我想用NA代替空格. df[df == ""] <- NA
是一种简单的方法,它适用于我数据框的大多数单元....但并非适用于所有人!
I want to substitute whitespaces with NA. A simple way could be df[df == ""] <- NA
, and that works for most of the cells of my data frame....but not for everyone!
我有以下代码:
library(rvest)
library(dplyr)
library(tidyr)
#Read website
htmlpage <- read_html("http://www.soccervista.com/results-Liga_MX_Apertura-2016_2017-844815.html")
#Extract table
df <- htmlpage %>% html_nodes("table") %>% html_table()
df <- as.data.frame(df)
#Set whitespaces into NA's
df[df == ""] <- NA
我发现有些空格在引号之间有一些空格
I figured out that some whitespaces have a little whitespace between the quotation marks
df[11,1]
[1] " "
df[11,1]
[1] " "
所以我的解决方案是执行下一个:df[df == " "] <- NA
So my solution was to do the next: df[df == " "] <- NA
但是问题仍然存在,并且空格很小!我以为修剪功能可以工作,但是没有用...
However the problem is still there and it has the little whitespace! I thought the trim function would work but it didn't...
#Trim
df[,c(1:10)] <- sapply(df[,c(1:10)], trimws)
但是,问题无法解决.
有什么想法吗?
推荐答案
我们需要使用lapply
而不是sapply
,因为sapply
返回的是matrix
而不是list
,这可能会在引号.
We need to use lapply
instead of sapply
as sapply
returns a matrix
instead of a list
and this can create problems in the quotes.
df[1:10] <- lapply(df[1:10], trimws)
如果我们有" "
这样的空格,另一个选择是使用gsub
将这些空格替换为""
and another option if we have spaces like " "
is to use gsub
to replace those spaces to ""
df[1:10] <- lapply(df[,c(1:10)], function(x) gsub("^\\s+|\\s+$", "", x))
,然后将""
更改为NA
df[df == ""] <- NA
或者不用做两次替换,我们可以一口气用type.convert
df[] <- lapply(df, function(x)
type.convert(replace(x, grepl("^\\s*$", trimws(x)), NA), as.is = TRUE))
注意:当所有列都循环时,我们不必指定列索引
NOTE: We don't have to specify the column index when all the columns are looped
这篇关于R中的空白字符串不能替换为NA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!