引理化java [英] Lemmatization java
本文介绍了引理化java的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在为Java中的英语寻找 lemmatisation 实施。我已经找到了一些,但我需要一些不需要太多内存来运行的东西(1 GB顶部)。
谢谢。我不需要一个词干分析器。
I am looking for a lemmatisation implementation for English in Java. I found a few already, but I need something that does not need to much memory to run (1 GB top). Thanks. I do not need a stemmer.
推荐答案
Stanford CoreNLP Java库包含一个资源密集型的变形器,但是我已经在我的笔记本电脑上运行了<512MB RAM。
The Stanford CoreNLP Java library contains a lemmatizer that is a little resource intensive but I have run it on my laptop with <512MB of RAM.
使用它:
- 下载jar文件;
- 在您选择的编辑器中创建一个新项目/制作一个包含所有您刚下载的档案中包含的jar文件;
- 如下所示创建一个新的Java(基于斯坦福大学网站的片段);
- Download the jar files;
- Create a new project in your editor of choice/make an ant script that includes all of the jar files contained in the archive you just downloaded;
- Create a new Java as shown below (based upon the snippet from Stanford's site);
import java.util.Properties;
public class StanfordLemmatizer {
protected StanfordCoreNLP pipeline;
public StanfordLemmatizer() {
// Create StanfordCoreNLP object properties, with POS tagging
// (required for lemmatization), and lemmatization
Properties props;
props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma");
// StanfordCoreNLP loads a lot of models, so you probably
// only want to do this once per execution
this.pipeline = new StanfordCoreNLP(props);
}
public List<String> lemmatize(String documentText)
{
List<String> lemmas = new LinkedList<String>();
// create an empty Annotation just with the given text
Annotation document = new Annotation(documentText);
// run all Annotators on this text
this.pipeline.annotate(document);
// Iterate over all of the sentences found
List<CoreMap> sentences = document.get(SentencesAnnotation.class);
for(CoreMap sentence: sentences) {
// Iterate over all tokens in a sentence
for (CoreLabel token: sentence.get(TokensAnnotation.class)) {
// Retrieve and add the lemma for each word into the list of lemmas
lemmas.add(token.get(LemmaAnnotation.class));
}
}
return lemmas;
}
}
这篇关于引理化java的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文