将所有带有重音符号的字符替换为其对应的LaTeX [英] Replace all accented characters by their LaTeX equivalent

查看:77
本文介绍了将所有带有重音符号的字符替换为其对应的LaTeX的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出一个Unicode字符串,我想用产生它们的LaTeX代码替换非ASCII字符(例如,使é变为 \'e œ变为 \ oe ).我将其合并到Python代码中.这应该依赖于转换表,并且我想出了以下代码,它很简单,而且看起来工作得很好:

Given a Unicode string, I want to replace non-ASCII characters by LaTeX code producing them (for example, having é become \'e, and œ become \oe). I'm incorporating this into a Python code. This should rely on a translation table, and I have come up with the following code, which is simple and seems to work nicely:

accents = [
    [ u"à", "\\`a"],
    [ u"é", "\\'e"]
  ]
translation_table = dict([(ord(k), unicode(v)) for k, v in accents])
print u"été à l'eau".translate(translation_table)

但是,编写一个相当完整的翻译表将花费我很长时间,而Google并没有太大帮助.有人准备好了这样的东西,还是知道在哪里找到它?

But, writing a rather complete translation table will take me a long time, and Google didn't help much. Does someone have such a thing ready, or know where to find one?

PS:我是Python的新手,所以我当然欢迎对上面的代码发表评论.

PS: I'm new to Python, so I welcome comments on the code above, of course.

推荐答案

下载 Unicode字符数据库(关于1MB).您可以找到一个等价字符组合的关系表,例如é= \ u00E9 e + ́ ,它等效于 \ u0065 + \ u0301(拉丁文小写字母E + 合并紧急重音).您可以编写简单的代码来转换所有脚本的所有组合字符,也可以只转换所需的字符(可以通过数据库中的脚本字段进行控制).

Download the Unicode Character Database (about 1MB). you can find a relational table for equivalent character combination for example é = \u00E9 is e+ ́ that is equivalent to \u0065+\u0301 (LATIN SMALL LETTER E+COMBINING ACUTE ACCENT). you can write simple codes for converting all combinational characters of all scripts or just them you want (you can control by script field in database).

然后将组合替换为LaTeX代码.例如,使用正则表达式 \ w \ u0065 替换双折射: \'< the_letter> .(我不确定语法.这取决于您的编程语言和正则表达式引擎.)

Then replace the combinations with LaTeX code. for example use regular expression \w\u0065 to replace diactrics :\'<the_letter>. (I'm not sure about syntax. It depends on your programming language and regular expression engine.)

如果您使用的是python,则已经有数据库和使用它的处理程序的实现.就像在下面的注释中提到的, import unicodedata .

If you are using python, you have already the database and an implementation of a handler to use it. just like mentioned in below comment, import unicodedata.

这篇关于将所有带有重音符号的字符替换为其对应的LaTeX的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆