什么是用于词性标记的好 Java 库? [英] What is a good Java library for Parts-Of-Speech tagging?

查看:22
本文介绍了什么是用于词性标记的好 Java 库?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一个好的 Java 开源POS Tagger.到目前为止,这是我想出的.

I'm looking for a good open source POS Tagger in Java. Here's what I have come up with so far.

有人有什么推荐吗?

推荐答案

您是否希望在特定域中标记 POS?大多数通用标记器都接受过新闻专线文本的培训.通常,当您在特定领域(例如生物医学文本)中使用它们时,它们的表现不佳.还有其他专门针对此类域训练的标记器,例如 dTagger (java) 用于生物医学文本.​​

Are you looking to tag POS in a specific domain? Most of the general purpose taggers are trained on newswire text. Typically they don't perform well when you are using them in specific domains (such and biomedical text). There are other taggers specifically trained for such domains such as dTagger (java) for biomedical text.

对于新闻专线文本,Adwait Ratnaparkhi 的 MXPOST 非常好并且是我推荐的那个.

For newswire text, Adwait Ratnaparkhi's MXPOST is very good and is the one I would recommend.

其他 Java 实现包括:

Other Java implementations include:

  1. MontyLingua
  2. Berkeley Parser(不是真正的词性标注器,但所有成熟的解析器都会通常包括 POS 标记器.Google for Java 语法分析器,您会发现很多.)
  3. QTag
  4. LBJ
  1. MontyLingua
  2. Berkeley Parser (Not really a POS tagger but all full blown parsers will typically include POS taggers. Google for Java syntactic parsers and you will find many.)
  3. QTag
  4. LBJ

OpenNLPLingpipe 其他发帖人发的也还不错.

OpenNLP and Lingpipe as posted by the other posters are also pretty decent.

可以找到有关 POS 标记的最新技术的信息 此处.正如你所看到的,LTAG-Spinal(另一位发帖人也提到了)排名最好的是目前,但各种标记器之间的差异并不大.我自己没有使用过 LTAG.

Info on the state-of-the-art on POS tagging can be found here. As you can see LTAG-Spinal (also mentioned by another poster) ranks best as of now, but the variation across the various taggers is not much. I have not used LTAG myself.

另请注意,词性标注的基准性能约为 90%.基线意味着 - (a) 通过词典中最常用的词性标签标记每个单词,以及 (b) 将每个未知单词标记为名词.

Also note that the baseline performance for POS tagging is about 90%. Baseline means - (a) tag every word by most frequent POS tag from a lexicon, and (b) tag every unknown word as a noun.

这篇关于什么是用于词性标记的好 Java 库?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆