如何生成(书本)索引? [英] How to generate (book) indexes?

查看:213
本文介绍了如何生成(书本)索引?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要为一本书创建一个索引.乍一看,这项工作很容易-将单词按第一个字母分组,然后对其进行排序-但是,这种显而易见的解决方案仅适用于美国语言.但是,真正的词要复杂得多.请参见 http://en.wikipedia.org/wiki/Collat​​ion :

I need to create an index for a book. While the task is easy at the first look -- group words by the first letter, then sort them, -- this obvious solution works only for the usa language. The real word is, however, more complex. See http://en.wikipedia.org/wiki/Collation :

在使用扩展拉丁字母的语言中,计算机样式的数字排序与真正的字母排序之间的区别变得很明显.例如,西班牙语的29个字母组成的字母将ñ视为紧随n的基本字母,以前将ch和ll视为紧随c和l的基本字母. Ch和ll仍被认为是字母,但现在按字母顺序由两个字母组成. (新的字母顺序规则由西班牙皇家学院于1994年发布.)另一方面,有向图rr遵循预期的rqu,无论是否带有1994年的字母顺序规则.数字排序可能会在z后面错误地排列ñ,并将ch视为c + h,在使用1994年前的字母化时也不会正确.

The difference between computer-style numerical sorting and true alphabetical sorting becomes obvious in languages using an extended Latin alphabet. For example, the 29-letter alphabet of Spanish treats ñ as a basic letter following n, and formerly treated ch and ll as basic letters following c and l, respectively. Ch and ll are still considered letters, but are now alphabetized as two-letter combinations. (The new alphabetization rule was issued by the Royal Spanish Academy in 1994.) On the other hand, the digraph rr follows rqu as expected, both with and without the 1994 alphabetization rule. A numeric sort may order ñ incorrectly following z and treat ch as c + h, also incorrect when using pre-1994 alphabetization.

我试图找到一个现有的解决方案.

I tried to find an existing solution.

DocBook样式表不能解决该问题.

DocBook stylesheets does not address the problem.

我发现最匹配的是xindy( http://xindy.sourceforge.net/),但是该工具与LaTeX的连接过多.

The best match I found is xindy ( http://xindy.sourceforge.net/ ), but this tool is too much connected to LaTeX.

还有其他建议吗?

推荐答案

好吧,在回答了评论之后,我意识到我不需要生成索引的工具,而是一个可以根据文化进行排序的库.最初的实验表明,我将使用ICU及其Python绑定PyICU.例如:

Well, after answering to comments, I realized that I don't need a tool to generate indexes, but a library which can sort according to cultures. First experiments shows that I'm going to use ICU and its Python bindings PyICU. For example:

import icu
words = ["liche", "lichée", "lichen", "lichénoïde", "licher", "lichoter"]
collator = icu.Collator.createInstance(icu.Locale.getFrance())
for word in sorted(words, cmp=collator.compare):
  print word.decode("string-escape")

这篇关于如何生成(书本)索引?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆