图表错误(.....):'UTF8TOWCS'中的输入无效。 [英] ERROR in chartr(........), : invalid input '....' in 'UTF8TOWCS'

查看:58
本文介绍了图表错误(.....):'UTF8TOWCS'中的输入无效。的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要您的帮助,因为尝试不同的方法也会出现相同的错误。我想要从数据帧中删除特殊字符,如从数据帧中删除特殊字符。 谢谢!

首先我尝试这样做:

trata<-function(Campo){
  Campo<-Campo %>% chartr('ÇÆ£ØÞß&@Ð','XXXXXXXXX',.) %>%
    str_to_upper(locale = "es") %>% str_trim(side = "both") %>%
    str_replace_all("['´`^]","") %>% chartr('ÁÉÍÓÚÀÈÌÒÙÄËÏÖÜÂÊÎÔÛÅÃÕÑ','AEIOUAEIOUAEIOUAEIOUAAOX', .)
  return(Campo)
}


trataRS<-function(Campo){
  Campo<-Campo %>% chartr('ÇÆ£ØÞßÐ','XXXXXXXXX',.) %>%
    str_to_upper(locale = "es") %>% str_trim(side = "both") %>%
    str_replace_all("['´`^]","") %>% chartr('ÁÉÍÓÚÀÈÌÒÙÄËÏÖÜÂÊÎÔÛÅÃÕ','AEIOUAEIOUAEIOUAEIOUAAO', .)
  return(Campo)
}

然后我将这些函数应用于:

Base$paterno_originador<-trata(Base$paterno_originador)
Base$razon_originador <- trataRS(Base$razon_originador)

但我收到此错误:

Error in chartr("ÇÆ£ØÞßÐ","XXXXXXXXX",.) : invalid input 'HÉCTOR" in 'utftowcs'

所以我尝试了从@alexandre_lima:

找到的另一种方法
rm_accent <- function(str,pattern="all") {
  if(!is.character(str))
    str <- as.character(str)
  
  pattern <- unique(pattern)
  
  if(any(pattern=="Ç"))
    pattern[pattern=="Ç"] <- "ç"
  
  symbols <- c(
    acute = "áéíóúÁÉÍÓÚýÝ",
    grave = "àèìòùÀÈÌÒÙ",
    circunflex = "âêîôûÂÊÎÔÛ",
    tilde = "ãõÃÕñÑ",
    umlaut = "äëïöüÄËÏÖÜÿ",
    cedil = "çÇ"
  )
  
  nudeSymbols <- c(
    acute = "aeiouAEIOUyY",
    grave = "aeiouAEIOU",
    circunflex = "AEIOUAEIOU",
    tilde = "AOAOXX",
    umlaut = "AEIOUAEIOUX",
    cedil = "XX"
  )
  
  accentTypes <- c("´","`","^","~","¨","ç")
  
  if(any(c("all","al","a","todos","t","to","tod","todo")%in%pattern)) # opcao retirar todos
    return(chartr(paste(symbols, collapse=""), paste(nudeSymbols, collapse=""), str))
  
  for(i in which(accentTypes%in%pattern))
    str <- chartr(symbols[i],nudeSymbols[i], str) 
  
  return(str)
}

但我收到了类似的错误:

Error in chartr(paste(symbols, collapse = ""), paste(nudeSymbols, collapse = ""),  : 
  invalid input 'RUÍZ' in 'utf8towcs'

我写这篇文章是为了向您展示编码。在该列中有特殊字符的位置显示UTF-8:

编码(Base$NOMBRE_INCRENTATOR) [1]未知的

推荐答案

将.csv文件导入到R中时,将设置您的编码来解决‘utf8owcs’中无效输入的解决方案。

  1. 当您使用read.csv()o read.delim()导入文件时,请指定ENCODING=&Quot;UTF-8&Quot;或ENCODING=&Quot;拉丁语-1&Quot;。我试过拉丁语,它解决了这个问题。

  2. 您可能还希望检查您的系统编码是什么,并进行匹配。您可以使用Sys.getLocale()(并使用Sys.setLocale()设置它)来实现这一点。例如,在我的系统上:

Sys.getLocale() [1]";en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8";

示例

data <- read.delim("input/data/data.txt", sep=";", 
              encoding = "Latin-1", stringsAsFactors = F )

data <- read.csv("input/data/data.csv", sep=";", 
              encoding = "Latin-1", stringsAsFactors = F )

致以最诚挚的问候

这篇关于图表错误(.....):&#39;UTF8TOWCS&#39;中的输入无效。的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆