有没有办法获得“原始”的OpenNLP的文本数据？ [英] Is there a way to get the "original" text data for OpenNLP?

查看：103 发布时间：2018/12/28 16:17:31 java nlp opennlp

本文介绍了有没有办法获得“原始”的OpenNLP的文本数据？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我知道之前曾问过这个问题 - 但答案并不令人满意（从某种意义上说答案只是一个链接）。

I know that this question was asked before - but the answer was not satisfying (in the sense of that the answer was just a link ).

所以我的问题是，有没有办法扩展现有的openNLP模型？我已经了解了DBPedia / Wikipedia的技术。但是如果我只是想附加一些文字来改进模型呢？真的没有办法吗？（如果是这样 - 那真的很愚蠢......）

So my question is, is there any way to extend the existing openNLP models? I already know about the technique with DBPedia/Wikipedia. But what if i just want to append some lines of text to improve the models - is there really no way? (If so - that would be really stupid...)

推荐答案

不幸的是，你做不到。请参阅此问题，其中详细解答了同一问题。

Unfortunately, you can't. See this question which has a detailed answer to the same problem.

我认为，这是一个问题，因为当您处理文本时，您经常会遇到许可问题。例如，您无法在Twitter数据上构建语料库并将其发布到社区（请参阅本文获取更多信息）。

I think, that is a though problem because when you deal with texts you have often licensing issues. For example, you can not build a corpus on Twitter data and publish it to the community (see this paper for some more information).

因此，公司通常会构建特定于域的语料库并在内部使用它们。例如，我们在我们的研究项目中做过。因此，我们构建了一个工具（Quick Pad Tagger）来有效地创建带注释的语料库（请参阅此处）。

Therefore, often companies build domain specific corpora and use them internally. For example, we did in our research project. Therefore, we built a tool (Quick Pad Tagger) to create annotated corpora efficiently (see here).

这篇关于有没有办法获得“原始”的OpenNLP的文本数据？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

有没有办法获得“原始”的OpenNLP的文本数据？ [英] Is there a way to get the "original" text data for OpenNLP?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

有没有办法获得“原始”的OpenNLP的文本数据？ [英] Is there a way to get the &quot;original&quot; text data for OpenNLP?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

有没有办法获得“原始”的OpenNLP的文本数据？ [英] Is there a way to get the "original" text data for OpenNLP?

登录关闭