使用斯坦福大学自然语言处理提取名词短语 [英] Extract Noun phrase using stanford NLP

查看:102
本文介绍了使用斯坦福大学自然语言处理提取名词短语的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Stanford NLP从一个句子中找到主题/名词短语

I am trying to find the Theme/Noun phrase from a sentence using Stanford NLP

例如:我很想得到的句子白虎"

For eg: the sentence "the white tiger" I would love to get

主题/周围的短语为:白虎.

Theme/Nound phrase as : white tiger.

为此,我使用了pos标记器.我的示例代码如下.

For this I used pos tagger. My sample code is below.

我得到的结果是老虎",这是不正确的.我以前运行的示例代码是

Result I am getting is "tiger" which is not correct. Sample code I used to run is

public static void main(String[] args) throws IOException {
        Properties props = new Properties();
        props.setProperty("annotators", "tokenize,ssplit,parse");
        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
        Annotation annotation = new Annotation("the white tiger)");
        pipeline.annotate(annotation);
        List<CoreMap> sentences = annotation
                .get(CoreAnnotations.SentencesAnnotation.class);
        System.out.println("the size of the senetence is......"
                + sentences.size());
        for (CoreMap sentence : sentences) {
            System.out.println("the senetence is..." + sentence.toString());
            Tree tree = sentence.get(TreeAnnotation.class);
            PrintWriter out = new PrintWriter(System.out);
            out.println("The first sentence parsed is:");
            tree.pennPrint(out);
            System.out.println("does it comes here.....1111");
            TregexPattern pattern = TregexPattern.compile("@NP");
            TregexMatcher matcher = pattern.matcher(tree);
            while (matcher.find()) {
                Tree match = matcher.getMatch();
                List<Tree> leaves1 = match.getChildrenAsList();
                StringBuilder stringbuilder = new StringBuilder();
                for (Tree tree1 : leaves1) {
                    String val = tree1.label().value();
                    if (val.equals("NN") || val.equals("NNS")
                            || val.equals("NNP") || val.equals("NNPS")) {
                        Tree nn[] = tree1.children();
                        String ss = Sentence.listToString(nn[0].yield());
                        stringbuilder.append(ss).append(" ");

                    }
                }
                System.out.println("the final stringbilder is ...."
                        + stringbuilder);
            }

        }

    }

我们非常感谢您的帮助.实现此目标的任何其他想法.

Any help is really appreciated.Any other thoughts to get this achieved.

推荐答案

似乎您要降级依赖树以寻找NN.*. "white"是JJ(一种形容词),搜索NN.*时不会包含.

It looks like you're descending the dependency trees looking for NN.*. "white" is a JJ--an adjective--which won't be included searching for NN.*.

您应该仔细阅读斯坦福依赖关系手册,并确定其中的哪一部分语音标签包含您要查找的内容.您还应该查看真实的语言数据,以尝试弄清要完成的任务中的重要事项.怎么样:

You should take a close look at the Stanford Dependencies Manual and decide what part of speech tags encompass what you're looking for. You should also look at real linguistic data to try to figure out what matters in the task you're trying to complete. What about:

the tiger [with the black one] [who was white]

在这种情况下仅遍历树将为您提供tiger black white.排除PP的?然后,您将丢失很多有用的信息:

Simply traversing the tree in that case will give you tiger black white. Exclude PP's? Then you lose lots of good info:

the tiger [with white fur]

我不确定您要完成的工作,但是请确保以正确的方式限制了您要执行的操作.

I'm not sure what you're trying to accomplish, but make sure what you're trying to do is restricted in the right way.

您还应该完善基本语法.语言学家称白虎"为名词短语或NP.语言学家很难要求您呼叫NP句子.句子中通常还有许多NP;有时,它们甚至相互嵌入. 《斯坦福依赖手册》是一个良好的开端.就像名字一样,斯坦福大学依存关系基于依存语法的思想,尽管有<一种href ="http://en.wikipedia.org/wiki/Syntax" rel ="nofollow">其他方法,它们带来了不同的见解.

You ought to polish up on your basic syntax as well. "the white tiger" is what linguists call a Noun Phrase or NP. You'd be hard pressed for a linguist to call an NP a sentence. There are also often many NPs inside a sentence; sometimes, they're even embedded inside one another. The Stanford Dependencies Manual is a good start. As in the name, the Stanford Dependencies are based on the idea of dependency grammar, though there are other approaches that bring different insights to the table.

学习语言学家对句子结构的了解可以极大地帮助您了解您尝试提取的内容,或者(经常发生)认识到您尝试提取的内容太难了并且需要找到解决方案的新途径.

Learning what linguists know about the structure of sentences could help you significantly in getting at what you're trying to extract or--as happens often--realizing that what you're trying to extract is too difficult and that you need to find a new route to a solution.

这篇关于使用斯坦福大学自然语言处理提取名词短语的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆