将重音符号转换为ASCII字符 [英] Convert accented characters into ascii character
问题描述
从1600万个字符串变量的向量中删除德语(或法语)重音符号的最佳方法是什么.
What is the optimal way to to remove German (or French) accents from a vector of 16 million string variables.
例如从干燥综合征"转变为干燥综合征"
e.g., 'Sjögren's syndrome' into 'Sjogren's syndrome'
将单个字符转换成单个字符要比音译(例如
Converstion of single character into a single character is better then transliteration such as
ä=> aeö=> oeü=> ue.
ä => ae ö => oe ü => ue.
例如,使用正则表达式是一种选择,但是有什么更好的选择(针对此的R包)?
e.g., using regular expression would be one option but is there something better (R package for this)?
gsub('ü','u',gsub('ö','o',"Sjögren's syndrome ( über) "))
对于非R平台有SO解决方案,但对于R而言不是一个好的解决方案.
There are SO solutions for non-R platforms but not a good one for R.
推荐答案
使用iconv
通过音译转换为ASCII(如果支持):
Use iconv
to convert to ASCII with transliteration (if supported):
iconv(c("über","Sjögren's"),to="ASCII//TRANSLIT")
[1] "uber" "Sjogren's"
这篇关于将重音符号转换为ASCII字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!