查找具有不同形式的英语单词的数据库或文本文件 [英] Looking for a database or text file of english words with their different forms

查看:64
本文介绍了查找具有不同形式的英语单词的数据库或文本文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在做一个项目,我需要弄清楚一个给定单词的词根(词干).如您所知,不使用字典的词干算法是不准确的.我也尝试过WordNet,但这对我的项目不利.我找到了phpmorphy项目,但它不包含Java中的API.

I am working on a project and I need to get the root of a given word (stemming). As you know, the stemming algorithms that don't use a dictionary are not accurate. Also I tried the WordNet but it is not good for my project. I found phpmorphy project but it doesn't include API in Java.

这时候我正在寻找数据库或英语单词的文本文件,它们的格式不同.例如:

At this time I am looking for a database or a text file of english words with their different forms. for example:

运行跑步跑... 包括包括在内... ...

run running ran ... include including included ... ...

感谢您的帮助或建议.

Thank you for your help or advise.

推荐答案

您可以下载 LanguageTool (免责声明:我是维护者),它带有二进制文件english.dict. LanguageTool Wiki 描述了如何将该文件转储为文本文件:

You could download LanguageTool (Disclaimer: I'm the maintainer), which comes with a binary file english.dict. The LanguageTool Wiki describes how to dump that file as a text file:

java -jar morfologik-tools-1.6.0-standalone.jar fsa_dump -x -d english.dict

对于run,文件将包含以下内容:

For run, the file will contain this:

ran run VBD
run run NN
run run VB
run run VBN
run run VBP
running run VBG
runs run NNS
runs run VBZ

根据(略微扩展) 查看全文

登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆