如何在Perl中用ASCII替换Unicode字符? [英] How can I substitute Unicode characters with ASCII in Perl?
问题描述
我可以在vim中这样做:
I can do it in vim like so:
:%s/\%u2013/-/g
我该如何在Perl中做到这一点?我以为这样做可以,但是似乎不起作用:
How do I do the equivalent in Perl? I thought this would do it but it doesn't seem to be working:
perl -i -pe 's/\x{2013}/-/g' my.dat
推荐答案
对于一般解决方案, Text :: Unidecode 几乎将所有扔给它的东西音译成纯US-ASCII.
For a generic solution, Text::Unidecode transliterate pretty much anything that's thrown at it into pure US-ASCII.
因此,在您的情况下,这将起作用:
So in your case this would work:
perl -C -MText::Unidecode -n -i -e'print unidecode( $_)' unicode_text.txt
-C可以确保输入被读取为utf8
The -C is there to make sure the input is read as utf8
它将转换为:
l'été est arrivé à peine après aôut
¿España es un paìs muy lindo?
some special chars: » « ® ¼ ¶ – – — Ṉ
Some greek letters: β ÷ Θ ¬ the α and ω (or is it Ω?)
hiragana? みせる です
Здравствуйте
السلام عليكم
对此:
l'ete est arrive a peine apres aout
?Espana es un pais muy lindo?
some special chars: >> << (r) 1/4 P - - -- N
Some greek letters: b / Th ! the a and o (or is it O?)
hiragana? miseru desu
Zdravstvuitie
lslm `lykm
最后一个显示了模块的限制,该模块无法推断元音并无法从原始阿拉伯语中获取as-salaamu`alaykum.我觉得还是很好
The last one shows the limits of the module, which can't infer the vowels and get as-salaamu `alaykum from the original arabic. It's still pretty good I think
这篇关于如何在Perl中用ASCII替换Unicode字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!