将重音符号转换为ASCII字符 [英] Convert accented characters into ascii character

查看:111
本文介绍了将重音符号转换为ASCII字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从1600万个字符串变量的向量中删除德语(或法语)重音符号的最佳方法是什么.

What is the optimal way to to remove German (or French) accents from a vector of 16 million string variables.

例如从干燥综合征"转变为干燥综合征"

e.g., 'Sjögren's syndrome' into 'Sjogren's syndrome'

将单个字符转换成单个字符要比音译(例如

Converstion of single character into a single character is better then transliteration such as

ä=> aeö=> oeü=> ue.

ä => ae ö => oe ü => ue.

例如,使用正则表达式是一种选择,但是有什么更好的选择(针对此的R包)?

e.g., using regular expression would be one option but is there something better (R package for this)?

gsub('ü','u',gsub('ö','o',"Sjögren's syndrome ( über) "))

对于非R​​平台有SO解决方案,但对于R而言不是一个好的解决方案.

There are SO solutions for non-R platforms but not a good one for R.

推荐答案

使用iconv通过音译转换为ASCII(如果支持):

Use iconv to convert to ASCII with transliteration (if supported):

iconv(c("über","Sjögren's"),to="ASCII//TRANSLIT")
[1] "uber"      "Sjogren's"

这篇关于将重音符号转换为ASCII字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆