引理化java [英] Lemmatization java

查看:155
本文介绍了引理化java的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为Java中的英语寻找 lemmatisation 实施。我已经找到了一些,但我需要一些不需要太多内存来运行的东西(1 GB顶部)。
谢谢。我不需要一个词干分析器。

I am looking for a lemmatisation implementation for English in Java. I found a few already, but I need something that does not need to much memory to run (1 GB top). Thanks. I do not need a stemmer.

推荐答案

Stanford CoreNLP Java库包含一个资源密集型的变形器,但是我已经在我的笔记本电脑上运行了<512MB RAM。

The Stanford CoreNLP Java library contains a lemmatizer that is a little resource intensive but I have run it on my laptop with <512MB of RAM.

使用它:


  1. 下载jar文件;

  2. 在您选择的编辑器中创建一个新项目/制作一个包含所有您刚下载的档案中包含的jar文件;

  3. 如下所示创建一个新的Java(基于斯坦福大学网站的片段);

  1. Download the jar files;
  2. Create a new project in your editor of choice/make an ant script that includes all of the jar files contained in the archive you just downloaded;
  3. Create a new Java as shown below (based upon the snippet from Stanford's site);





import java.util.Properties;

public class StanfordLemmatizer {

    protected StanfordCoreNLP pipeline;

    public StanfordLemmatizer() {
        // Create StanfordCoreNLP object properties, with POS tagging
        // (required for lemmatization), and lemmatization
        Properties props;
        props = new Properties();
        props.put("annotators", "tokenize, ssplit, pos, lemma");

        // StanfordCoreNLP loads a lot of models, so you probably
        // only want to do this once per execution
        this.pipeline = new StanfordCoreNLP(props);
    }

    public List<String> lemmatize(String documentText)
    {
        List<String> lemmas = new LinkedList<String>();

        // create an empty Annotation just with the given text
        Annotation document = new Annotation(documentText);

        // run all Annotators on this text
        this.pipeline.annotate(document);

        // Iterate over all of the sentences found
        List<CoreMap> sentences = document.get(SentencesAnnotation.class);
        for(CoreMap sentence: sentences) {
            // Iterate over all tokens in a sentence
            for (CoreLabel token: sentence.get(TokensAnnotation.class)) {
                // Retrieve and add the lemma for each word into the list of lemmas
                lemmas.add(token.get(LemmaAnnotation.class));
            }
        }

        return lemmas;
    }
}

这篇关于引理化java的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆