使用Lucene&生成所有单词形式洪斯佩尔 [英] Generate all word forms using Lucene & Hunspell

查看:48
本文介绍了使用Lucene&生成所有单词形式洪斯佩尔的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我处理的应用程序中,我们使用Lucene Analyzer,尤其是Hunspell的一部分.我面临的问题是:我需要使用一组词缀规则来生成一个单词的所有单词形式.

In an application I work on, we use Lucene Analyzer, especially it's Hunspell part. The problem I face is: I need to generate all word forms of a word, using a set of affix rules.

例如拥有教育"一词并在规则上加上ABC规则,则生成所有形式的教育"一词.-受过教育,受过教育,受过教育的人等等.

E.g. having the word 'educate' and affix rules ABC, generate all forms of word 'educate.' - educates, educated, educative, etc.

我想知道的是:是否可以使用Lucene的Hunspell实现(我们使用Hunspell词典(.dic)和词缀文件(.aff),因此它必须是Hunspell API)来做到这一点?Lucene的Hunspell API没那么大,我经历了它,却没有找到合适的东西.

What I'd like to know is: is it possible to do this using Lucene's Hunspell implementation (we use a Hunspell dictionary (.dic) and affix file (.aff), so it has to be a Hunspell API)? Lucene's Hunspell API isn't that big, I went through it, and didn't find something suitable.

我能在SO上找到的最近的是

Nearest I could find on SO was this, but there are no answers related to hunspell.

更新1 我不再在上面遇到的项目上工作,但是如果仍然有使用Lucene的分析器来解决此问题的解决方案,我很高兴社区能够查看答案.

Update 1 I'm not working on the project where I faced the above anymore, but if there still is a solution to do this using Lucene's Analyzer, I'd be glad that the community will see the answer.

推荐答案

Hunspell带有unmunch命令,该命令将创建所有单词形式.您可以这样称呼它:

Hunspell comes with the unmunch command, which will create all word forms. You can call it like this:

 unmunch en_GB.dic en_GB.aff

因此,您可能会在hunspell源码中查看如何实现此方法以及是否可以从外部调用它.上次我检查在带化合物的词典上使用该命令时,该命令有点错误-在那种情况下,您将无法创建 all 字形,因为它们的数量是无限的.

Thus you might look in the hunspell source how this is implemented and whether it can be called from outside. The command was a bit buggy last time I checked when used on dictionaries with compounds - in those cases you cannot create all wordforms anyway, as there is an infinite number of them.

这篇关于使用Lucene&生成所有单词形式洪斯佩尔的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆