从数据文件中删除非 ASCII 字符 [英] Removing non-ASCII characters from data files

查看：29 发布时间：2021/11/28 22:37:46 r unicode ascii non-ascii-characters

本文介绍了从数据文件中删除非 ASCII 字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一堆 csv 文件，我正在将它们读入 R 并以 .rdata 格式包含在包/数据文件夹中.不幸的是，数据中的非 ASCII 字符未能通过检查.tools 包有两个函数来检查非 ASCII 字符(showNonASCII 和 showNonASCIIfile)，但我似乎找不到要删除的函数/清理它们.

I've got a bunch of csv files that I'm reading into R and including in a package/data folder in .rdata format. Unfortunately the non-ASCII characters in the data fail the check. The tools package has two functions to check for non-ASCII characters (showNonASCII and showNonASCIIfile) but I can't seem to locate one to remove/clean them.

在我探索其他 UNIX 工具之前，最好在 R 中完成所有这些工作，这样我就可以维护从原始数据到最终产品的完整工作流程.是否有任何现有的包/函数可以帮助我摆脱非 ASCII 字符?

Before I explore other UNIX tools, it would be great to do this all in R so I can maintain a complete workflow from raw data to final product. Are there any existing packages/functions to help me get rid of the non-ASCII characters?

推荐答案

要简单地删除非 ASCII 字符，您可以使用 base R 的 iconv()，设置sub = "".这样的事情应该可以工作:

To simply remove the non-ASCII characters, you could use base R's iconv(), setting sub = "". Something like this should work:

x <- c("Ekstrxf8m", "Jxf6reskog", "bixdfchen Zxfcrcher") # e.g. from ?iconv
Encoding(x) <- "latin1"  # (just to make sure)
x
# [1] "Ekstrøm"         "Jöreskog"        "bißchen Zürcher"

iconv(x, "latin1", "ASCII", sub="")
# [1] "Ekstrm"        "Jreskog"       "bichen Zrcher"

要定位非 ASCII 字符，或查找文件中是否有任何字符，您可能会采用以下想法:

To locate non-ASCII characters, or to find if there were any at all in your files, you could likely adapt the following ideas:

## Do *any* lines contain non-ASCII characters? 
any(grepl("I_WAS_NOT_ASCII", iconv(x, "latin1", "ASCII", sub="I_WAS_NOT_ASCII")))
[1] TRUE

## Find which lines (e.g. read in by readLines()) contain non-ASCII characters
grep("I_WAS_NOT_ASCII", iconv(x, "latin1", "ASCII", sub="I_WAS_NOT_ASCII"))
[1] 1 2 3

这篇关于从数据文件中删除非 ASCII 字符的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从数据文件中删除非 ASCII 字符 [英] Removing non-ASCII characters from data files

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

从数据文件中删除非 ASCII 字符 [英] Removing non-ASCII characters from data files

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭