如何从文件中删除所有变音符号? [英] How to remove all of the diacritics from a file?
本文介绍了如何从文件中删除所有变音符号?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个文件,其中包含许多带有变音符号的元音.我需要进行这些替换:
I have a file containing many vowels with diacritics. I need to make these replacements:
- 用 a 替换 ā、á、ǎ 和 à.
- 将 ē、é、ě 和 è 替换为 e.
- 用 i 替换 ī、í、ǐ 和 ì.
- 将 ō、ó、ǒ 和 ò 替换为 o.
- 用 u 替换 ū、ú、ǔ 和 ù.
- 用ü替换ǖ、ǘ、ǚ和ǜ.
- 将 Ā、Á、Ǎ 和 À 替换为 A.
- 用 E 替换 Ē、É、Ě 和 È.
- 用 I 替换 Ī、Í、Ǐ 和 Ì.
- 用 O 替换 Ō、Ó、Ǒ 和 Ò.
- 用 U 替换 Ū、Ú、Ǔ 和 Ù.
- 将Ǖ、Ǘ、Ǚ和Ǜ替换为Ü.
我知道我可以一次更换一个:
I know I can replace them one at a time with this:
sed -i 's/ā/a/g' ./file.txt
有没有更有效的方法来替换所有这些?
Is there a more efficient way to replace all of these?
推荐答案
如果查看工具iconv
的手册页:
If you check the man page of the tool iconv
:
//TRANSLIT
当字符串//TRANSLIT"附加到--to-code,音译被激活.这意味着当一个字符不能在目标字符集,可以通过一个或几个相似的字符来近似.
//TRANSLIT
When the string "//TRANSLIT" is appended to --to-code, transliteration is activated. This means that when a character cannot be represented in the target character set, it can be approximated through one or several similarly looking characters.
所以我们可以这样做:
kent$ cat test1
Replace ā, á, ǎ, and à with a.
Replace ē, é, ě, and è with e.
Replace ī, í, ǐ, and ì with i.
Replace ō, ó, ǒ, and ò with o.
Replace ū, ú, ǔ, and ù with u.
Replace ǖ, ǘ, ǚ, and ǜ with ü.
Replace Ā, Á, Ǎ, and À with A.
Replace Ē, É, Ě, and È with E.
Replace Ī, Í, Ǐ, and Ì with I.
Replace Ō, Ó, Ǒ, and Ò with O.
Replace Ū, Ú, Ǔ, and Ù with U.
Replace Ǖ, Ǘ, Ǚ, and Ǜ with U.
kent$ iconv -f utf8 -t ascii//TRANSLIT test1
Replace a, a, a, and a with a.
Replace e, e, e, and e with e.
Replace i, i, i, and i with i.
Replace o, o, o, and o with o.
Replace u, u, u, and u with u.
Replace u, u, u, and u with u.
Replace A, A, A, and A with A.
Replace E, E, E, and E with E.
Replace I, I, I, and I with I.
Replace O, O, O, and O with O.
Replace U, U, U, and U with U.
Replace U, U, U, and U with U.
这篇关于如何从文件中删除所有变音符号?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文