引理化java [英] Lemmatization java

查看：155 发布时间：2018/11/27 11:45:16 java nlp

本文介绍了引理化java的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在为Java中的英语寻找 lemmatisation 实施。我已经找到了一些，但我需要一些不需要太多内存来运行的东西（1 GB顶部）。
谢谢。我不需要一个词干分析器。

I am looking for a lemmatisation implementation for English in Java. I found a few already, but I need something that does not need to much memory to run (1 GB top). Thanks. I do not need a stemmer.

推荐答案

Stanford CoreNLP Java库包含一个资源密集型的变形器，但是我已经在我的笔记本电脑上运行了<512MB RAM。

The Stanford CoreNLP Java library contains a lemmatizer that is a little resource intensive but I have run it on my laptop with <512MB of RAM.

使用它：

下载jar文件;

在您选择的编辑器中创建一个新项目/制作一个包含所有您刚下载的档案中包含的jar文件;

如下所示创建一个新的Java（基于斯坦福大学网站的片段）;

Download the jar files;
Create a new project in your editor of choice/make an ant script that includes all of the jar files contained in the archive you just downloaded;
Create a new Java as shown below (based upon the snippet from Stanford's site);

import java.util.Properties;

public class StanfordLemmatizer {

    protected StanfordCoreNLP pipeline;

    public StanfordLemmatizer() {
        // Create StanfordCoreNLP object properties, with POS tagging
        // (required for lemmatization), and lemmatization
        Properties props;
        props = new Properties();
        props.put("annotators", "tokenize, ssplit, pos, lemma");

        // StanfordCoreNLP loads a lot of models, so you probably
        // only want to do this once per execution
        this.pipeline = new StanfordCoreNLP(props);
    }

    public List<String> lemmatize(String documentText)
    {
        List<String> lemmas = new LinkedList<String>();

        // create an empty Annotation just with the given text
        Annotation document = new Annotation(documentText);

        // run all Annotators on this text
        this.pipeline.annotate(document);

        // Iterate over all of the sentences found
        List<CoreMap> sentences = document.get(SentencesAnnotation.class);
        for(CoreMap sentence: sentences) {
            // Iterate over all tokens in a sentence
            for (CoreLabel token: sentence.get(TokensAnnotation.class)) {
                // Retrieve and add the lemma for each word into the list of lemmas
                lemmas.add(token.get(LemmaAnnotation.class));
            }
        }

        return lemmas;
    }
}

这篇关于引理化java的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

引理化java [英] Lemmatization java

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

引理化java [英] Lemmatization java

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭