图表错误(.....):';UTF8TOWCS';中的输入无效。 [英] ERROR in chartr(........), : invalid input '....' in 'UTF8TOWCS'
本文介绍了图表错误(.....):';UTF8TOWCS';中的输入无效。的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
首先我尝试这样做:
trata<-function(Campo){
Campo<-Campo %>% chartr('ÇÆ£ØÞß&@Ð','XXXXXXXXX',.) %>%
str_to_upper(locale = "es") %>% str_trim(side = "both") %>%
str_replace_all("['´`^]","") %>% chartr('ÁÉÍÓÚÀÈÌÒÙÄËÏÖÜÂÊÎÔÛÅÃÕÑ','AEIOUAEIOUAEIOUAEIOUAAOX', .)
return(Campo)
}
trataRS<-function(Campo){
Campo<-Campo %>% chartr('ÇÆ£ØÞßÐ','XXXXXXXXX',.) %>%
str_to_upper(locale = "es") %>% str_trim(side = "both") %>%
str_replace_all("['´`^]","") %>% chartr('ÁÉÍÓÚÀÈÌÒÙÄËÏÖÜÂÊÎÔÛÅÃÕ','AEIOUAEIOUAEIOUAEIOUAAO', .)
return(Campo)
}
然后我将这些函数应用于:
Base$paterno_originador<-trata(Base$paterno_originador)
Base$razon_originador <- trataRS(Base$razon_originador)
但我收到此错误:
Error in chartr("ÇÆ£ØÞßÐ","XXXXXXXXX",.) : invalid input 'HÉCTOR" in 'utftowcs'
所以我尝试了从@alexandre_lima:
找到的另一种方法rm_accent <- function(str,pattern="all") {
if(!is.character(str))
str <- as.character(str)
pattern <- unique(pattern)
if(any(pattern=="Ç"))
pattern[pattern=="Ç"] <- "ç"
symbols <- c(
acute = "áéíóúÁÉÍÓÚýÝ",
grave = "àèìòùÀÈÌÒÙ",
circunflex = "âêîôûÂÊÎÔÛ",
tilde = "ãõÃÕñÑ",
umlaut = "äëïöüÄËÏÖÜÿ",
cedil = "çÇ"
)
nudeSymbols <- c(
acute = "aeiouAEIOUyY",
grave = "aeiouAEIOU",
circunflex = "AEIOUAEIOU",
tilde = "AOAOXX",
umlaut = "AEIOUAEIOUX",
cedil = "XX"
)
accentTypes <- c("´","`","^","~","¨","ç")
if(any(c("all","al","a","todos","t","to","tod","todo")%in%pattern)) # opcao retirar todos
return(chartr(paste(symbols, collapse=""), paste(nudeSymbols, collapse=""), str))
for(i in which(accentTypes%in%pattern))
str <- chartr(symbols[i],nudeSymbols[i], str)
return(str)
}
但我收到了类似的错误:
Error in chartr(paste(symbols, collapse = ""), paste(nudeSymbols, collapse = ""), :
invalid input 'RUÍZ' in 'utf8towcs'
我写这篇文章是为了向您展示编码。在该列中有特殊字符的位置显示UTF-8:
编码(Base$NOMBRE_INCRENTATOR) [1]未知的
推荐答案
将.csv文件导入到R中时,将设置您的编码来解决‘utf8owcs’中无效输入的解决方案。
当您使用read.csv()o read.delim()导入文件时,请指定ENCODING=&Quot;UTF-8&Quot;或ENCODING=&Quot;拉丁语-1&Quot;。我试过拉丁语,它解决了这个问题。
您可能还希望检查您的系统编码是什么,并进行匹配。您可以使用Sys.getLocale()(并使用Sys.setLocale()设置它)来实现这一点。例如,在我的系统上:
Sys.getLocale() [1]";en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8";
示例
data <- read.delim("input/data/data.txt", sep=";",
encoding = "Latin-1", stringsAsFactors = F )
data <- read.csv("input/data/data.csv", sep=";",
encoding = "Latin-1", stringsAsFactors = F )
致以最诚挚的问候
这篇关于图表错误(.....):';UTF8TOWCS';中的输入无效。的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文