如何在Perl中用ASCII替换Unicode字符? [英] How can I substitute Unicode characters with ASCII in Perl?

查看:88
本文介绍了如何在Perl中用ASCII替换Unicode字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我可以在vim中这样做:

I can do it in vim like so:

:%s/\%u2013/-/g

我该如何在Perl中做到这一点?我以为这样做可以,但是似乎不起作用:

How do I do the equivalent in Perl? I thought this would do it but it doesn't seem to be working:

perl -i -pe 's/\x{2013}/-/g' my.dat

推荐答案

对于一般解决方案, Text :: Unidecode 几乎将所有扔给它的东西音译成纯US-ASCII.

For a generic solution, Text::Unidecode transliterate pretty much anything that's thrown at it into pure US-ASCII.

因此,在您的情况下,这将起作用:

So in your case this would work:

perl -C -MText::Unidecode -n -i -e'print unidecode( $_)' unicode_text.txt

-C可以确保输入被读取为utf8

The -C is there to make sure the input is read as utf8

它将转换为:

l'été est arrivé à peine après aôut
¿España es un paìs muy lindo?
some special chars: » « ® ¼ ¶ – – — Ṉ
Some greek letters: β ÷ Θ ¬ the α and ω (or is it Ω?)
hiragana? みせる です
Здравствуйте
السلام عليكم

对此:

l'ete est arrive a peine apres aout
?Espana es un pais muy lindo?
some special chars: >> << (r) 1/4 P - - -- N
Some greek letters: b / Th ! the a and o (or is it O?)
hiragana? miseru desu
Zdravstvuitie
lslm `lykm

最后一个显示了模块的限制,该模块无法推断元音并无法从原始阿拉伯语中获取as-salaamu`alaykum.我觉得还是很好

The last one shows the limits of the module, which can't infer the vowels and get as-salaamu `alaykum from the original arabic. It's still pretty good I think

这篇关于如何在Perl中用ASCII替换Unicode字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆