将所有带有重音符号的字符替换为其对应的LaTeX [英] Replace all accented characters by their LaTeX equivalent
问题描述
给出一个Unicode字符串,我想用产生它们的LaTeX代码替换非ASCII字符(例如,使é
变为 \'e
和œ
变为 \ oe
).我将其合并到Python代码中.这应该依赖于转换表,并且我想出了以下代码,它很简单,而且看起来工作得很好:
Given a Unicode string, I want to replace non-ASCII characters by LaTeX code producing them (for example, having é
become \'e
, and œ
become \oe
). I'm incorporating this into a Python code. This should rely on a translation table, and I have come up with the following code, which is simple and seems to work nicely:
accents = [
[ u"à", "\\`a"],
[ u"é", "\\'e"]
]
translation_table = dict([(ord(k), unicode(v)) for k, v in accents])
print u"été à l'eau".translate(translation_table)
但是,编写一个相当完整的翻译表将花费我很长时间,而Google并没有太大帮助.有人准备好了这样的东西,还是知道在哪里找到它?
But, writing a rather complete translation table will take me a long time, and Google didn't help much. Does someone have such a thing ready, or know where to find one?
PS:我是Python的新手,所以我当然欢迎对上面的代码发表评论.
PS: I'm new to Python, so I welcome comments on the code above, of course.
推荐答案
下载 Unicode字符数据库(关于1MB).您可以找到一个等价字符组合的关系表,例如é= \ u00E9
是 e + ́
,它等效于 \ u0065 + \ u0301(拉丁文小写字母E
+ 合并紧急重音)
.您可以编写简单的代码来转换所有脚本的所有组合字符,也可以只转换所需的字符(可以通过数据库中的脚本字段进行控制).
Download the Unicode Character Database (about 1MB). you can find a relational table for equivalent character combination for example é = \u00E9
is e+ ́
that is equivalent to \u0065+\u0301 (LATIN SMALL LETTER E
+COMBINING ACUTE ACCENT)
. you can write simple codes for converting all combinational characters of all scripts or just them you want (you can control by script field in database).
然后将组合替换为LaTeX代码.例如,使用正则表达式 \ w \ u0065
替换双折射: \'< the_letter>
.(我不确定语法.这取决于您的编程语言和正则表达式引擎.)
Then replace the combinations with LaTeX code. for example use regular expression \w\u0065
to replace diactrics :\'<the_letter>
. (I'm not sure about syntax. It depends on your programming language and regular expression engine.)
如果您使用的是python,则已经有数据库和使用它的处理程序的实现.就像在下面的注释中提到的, import unicodedata
.
If you are using python, you have already the database and an implementation of a handler to use it. just like mentioned in below comment, import unicodedata
.
这篇关于将所有带有重音符号的字符替换为其对应的LaTeX的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!