同义词库类或PHP的API [编辑] [英] Thesaurus class or API for PHP [edited]

查看:107
本文介绍了同义词库类或PHP的API [编辑]的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

TL; DR摘要::我需要一个命令行应用程序,可以用来获取同义词和其他相关单词.它需要多语言并且可以跨平台工作.谁能为我推荐一个合适的程序,或者帮助我找到已经找到的程序?谢谢.


长版: 我的任务是用PHP编写一个系统,该系统可以为用户输入的单词提出其他建议.我需要找到一个同义词库应用程序/API或类似的库,可以用来生成这些建议.

TL;DR Summary: I need a single command-line application which I can use to get synonyms and other related words. It needs to be multi-lingual and works cross platform. Can anyone suggest a suitable program for me, or help me with the ones I've already found? Thanks.


Longer version: I've been tasked with writing a system in PHP that can come up with alternative suggestions for words entered by the user. I need to find a thesaurus application / API or similar which I can use to generate these suggestions.

重要的是,它需要多语种(英语,丹麦语,法语和德语).这排除了我使用Google设法找到的大多数软件.它还需要跨平台(需要在Linux和Windows上运行).

Importantly, it needs to be multilingual (English, Danish, French and German). This rules out most of the software that I managed to find using Google. It also needs to be cross-platform (it needs to work on Linux and Windows).

我的研究让我找到了两个有前途的候选人: WordNet

My research has let me to two promising candidates: WordNet and Stardict.

到目前为止,我一直专注于WordNet,使用shell_exec()函数从PHP调用了WordNet,并且设法使用它来创建了非常有前途的原型PHP页面,但是到目前为止仅使用英语.我正在努力使用多语言.

I've been focusing on WordNet so far, calling it from PHP using the shell_exec() function, and I've managed to use it to create a very promising prototype PHP page, but so far in English only. I'm struggling with how to use it multi-lingual.

Wordnet站点具有使用其他语言(例如 DanNet (丹麦语),但尽管它们通常被称为Wordnet,但它们似乎使用了多种数据库格式和软件,这使它们不适合我.我需要一个可以从我的PHP程序调用的一致接口.

The Wordnet site has external links to Wordnet projects in other language (eg DanNet for Danish), but although they're often called Wordnet, they seem to use a variety of database formats and software, which makes them unsuitable for me. I need a consistent interface that I can call from my PHP program.

从这一角度看,Stardict看起来更有希望:它们为一个应用程序提供标准DB格式的多种语言的字典.

Stardict looked more promising from that perspective: they provide dictionaries in many languages in a standard DB format for the one application.

但是Stardict的缺点是它主要是一个GUI应用程序.从命令行调用它会启动GUI.显然有一个命令行版本( SDCV ),但它似乎已经过时了(2006年最新更新) ,并且仅适用于Linux.

But the down-side of Stardict is that its primarily a GUI app. Calling it from the command-line launches the GUI. There is apparently a command-line version (SDCV), but it seems quite out of date (last update 2006), and only for Linux.

有人可以帮助我解决上述两个程序中的任何一个问题吗?否则,有人可以建议我可以使用的其他替代软件或API吗?

Can anyone help me with my problems with either of these programs? Or else, can anyone suggest any other alternative software or API that I could use?

非常感谢.

推荐答案

您可以尝试利用PostgreSQL的全文本搜索功能:

You could try to leverage PostgreSQL's full text search functionality:

http://www.postgresql.org/docs/9.0/static/textsearch.html

您可以使用任何可用语言和各种排序规则来配置它,以满足您的需求. PostgreSQL 9.1添加了一些额外的整理功能,如果这种方法合理的话,您可能想研究一下.

You can configure it with any of the available languages and all sorts of collations to fit your needs. PostgreSQL 9.1 adds some extra collation functionality that you may want to look into if the approach seems reasonable.

(针对每种语言)基本步骤是:

The basic steps would be (for each language):

  1. 创建所需的表(正确整理).就我们而言,单列就足够了,例如:

  1. Create the needed table (collated appropriately). For our sake, a single column is enough, e.g.:

create table dict_en (
  word text check (word = lower(word)) primary key
);

  • 获取所需的词典/同义词库文件(aspell/Open-Office中的那些文件应可用).

  • Fetch the needed dictionary/thesaurus files (those from aspell/Open-Office should work).

    使用相关文件配置文本搜索(请参见上面的链接,即12.6节).

    Configure text search (see link above, namely section 12.6) using the relevant files.

    将整个字典插入表中. (肯定在某个地方有一个csv文件...)

    Insert the whole dictionary into the table. (Surely there's a csv file somewhere...)

    最后对向量进行索引,例如:

    And finally index the vector, e.g.:

    create index on dict_en using gin (to_tsvector('english', word));
    

  • 您现在可以运行使用该索引的查询:

    You can now run queries that use this index:

    -- Find words related to `:word`
    select word
    from dict_en
    where to_tsvector('english', word) @@ plainto_tsquery('english', :word)
    and word <> :word;
    

    您可能需要为每种语言创建一个单独的数据库或架构,并且如果Postgres由于language参数拒绝索引表达式,则可能需要添加一个附加字段(tsvector). (我很久以前阅读了全文文档).有关详细信息,请参见第12.2节.如果是这种情况,我相信您会知道如何进行调整.

    You might need to create a separate database or schema for each language, and add an additional field (tsvector) if Postgres refuses to index the expression because of the language parameter. (I read the full text docs a long time ago). The details on this would be in section 12.2, and I'm sure you'll know how to adjust the above if this is the case.

    不过,无论实施细节如何,我都认为该方法应该可行.

    Whichever the implementation details, though, I believe the approach should work.

    这篇关于同义词库类或PHP的API [编辑]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆