斯坦福在python中使用coreNLP输入依赖项 [英] Stanford typed dependencies using coreNLP in python

查看:340
本文介绍了斯坦福在python中使用coreNLP输入依赖项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

斯坦福依赖项手册中,他们提到了斯坦福类型的依赖项",特别是类型"neg"-否定修饰符.当在网站上使用Stanford Enhanced ++解析器时,也可以使用它.例如,句子:

In Stanford Dependency Manual they mention "Stanford typed dependencies" and particularly the type "neg" - negation modifier. It is also available when using Stanford enhanced++ parser using the website. for example, the sentence:

巴拉克·奥巴马(Barack Obama)并非在夏威夷出生"

"Barack Obama was not born in Hawaii"

解析器确实找到了neg(天生的,不是)

The parser indeed find neg(born,not)

但是当我使用stanfordnlp python库时,我可以获得的唯一依赖项解析器将对句子进行如下解析:

but when I'm using the stanfordnlp python library, the only dependency parser I can get will parse the sentence as follow:

('Barack', '5', 'nsubj:pass')

('Obama', '1', 'flat')

('was', '5', 'aux:pass')

('not', '5', 'advmod')

('born', '0', 'root')

('in', '7', 'case')

('Hawaii', '5', 'obl')

以及生成它的代码:

import stanfordnlp
stanfordnlp.download('en')  
nlp = stanfordnlp.Pipeline()
doc = nlp("Barack Obama was not born in Hawaii")
a  = doc.sentences[0]
a.print_dependencies()

是否有办法获得与增强型依赖项解析器或任何其他斯坦福解析器类似的结果,这些结果会导致类型化的依赖项使我得到否定修饰符?

Is there a way to get similar results to the enhanced dependency parser or any other Stanford parser that result in typed dependencies that will give me the negation modifier?

推荐答案

请注意,python库stanfordnlp不仅仅是StanfordCoreNLP的python包装器.

It is to note the python library stanfordnlp is not just a python wrapper for StanfordCoreNLP.

stanfordnlp Github存储库上所述:

斯坦福大学NLP集团的官方Python NLP库.它包含 软件包,用于运行来自CoNLL的最新的全神经管道 2018共享任务,并用于访问Java Stanford CoreNLP服务器.

The Stanford NLP Group's official Python NLP library. It contains packages for running our latest fully neural pipeline from the CoNLL 2018 Shared Task and for accessing the Java Stanford CoreNLP server.

Stanfordnlp包含一组新的神经网络模型,这些模型经过CONLL 2018共享任务训练.在线解析器基于CoreNLP 3.9.2 Java库.如此处所述,这是两个不同的管道和模型集.

Stanfordnlp contains a new set of neural networks models, trained on the CONLL 2018 shared task. The online parser is based on the CoreNLP 3.9.2 java library. Those are two different pipelines and sets of models, as explained here.

您的代码仅访问经过CONLL 2018数据训练的神经管道.这就解释了您看到的与在线版本相比的差异.基本上是两种不同的模型.

Your code only accesses their neural pipeline trained on CONLL 2018 data. This explains the differences you saw compared to the online version. Those are basically two different models.

让我感到困惑的是,这两个存储库都属于名为stanfordnlp(即团队名称)的用户.不要在Java stanfordnlp/CoreNLP和python stanfordnlp/stanfordnlp之间迷惑.

What adds to the confusion I believe is that both repositories belong to the user named stanfordnlp (which is the team name). Don't be fooled between the java stanfordnlp/CoreNLP and the python stanfordnlp/stanfordnlp.

关于您的否定"问题,似乎他们在python libabry stanfordnlp中决定完全考虑使用"advmod"注解进行否定.至少这就是我遇到的一些例句.

Concerning your 'neg' issue, it seems that in the python libabry stanfordnlp, they decided to consider the negation with an 'advmod' annotation altogether. At least that is what I ran into for a few example sentences.

但是,您仍然可以通过stanfordnlp软件包访问CoreNLP.不过,它还需要一些步骤.引用Github存储库,

However, you can still get access to the CoreNLP through the stanfordnlp package. It requires a few more steps, though. Citing the Github repo,

有一些初始设置步骤.

There are a few initial setup steps.

  • 为您要使用的语言下载Stanford CoreNLP和模型. (您可以在此处下载CoreNLP和语言模型)
  • 将模型罐子放在分发文件夹中
  • 告诉斯坦福CoreNLP所在的python代码:export CORENLP_HOME =/path/to/stanford-corenlp-full-2018-10-05
  • Download Stanford CoreNLP and models for the language you wish to use. (you can download CoreNLP and the language models here)
  • Put the model jars in the distribution folder
  • Tell the python code where Stanford CoreNLP is located: export CORENLP_HOME=/path/to/stanford-corenlp-full-2018-10-05

完成后,您可以使用演示:

from stanfordnlp.server import CoreNLPClient 

with CoreNLPClient(annotators=['tokenize','ssplit','pos','depparse'], timeout=60000, memory='16G') as client:
    # submit the request to the server
    ann = client.annotate(text)

    # get the first sentence
    sentence = ann.sentence[0]

    # get the dependency parse of the first sentence
    print('---')
    print('dependency parse of first sentence')
    dependency_parse = sentence.basicDependencies
    print(dependency_parse)

    #get the tokens of the first sentence
    #note that 1 token is 1 node in the parse tree, nodes start at 1
    print('---')
    print('Tokens of first sentence')
    for token in sentence.token :
        print(token)

因此,如果您指定"depparse"注释器(以及必备注释器标记化,分割和pos),则将对您的句子进行解析. 阅读该演示,感觉我们只能访问basicDependencies.我还没有通过stanfordnlp使Enhanced ++依赖项起作用.

Your sentence will therefore be parsed if you specify the 'depparse' annotator (as well as the prerequisite annotators tokenize, ssplit, and pos). Reading the demo, it feels that we can only access basicDependencies. I have not managed to make Enhanced++ dependencies work via stanfordnlp.

但是如果您使用basicDependencies,否定词仍会出现!

But the negations will still appear if you use basicDependencies !

这是我使用stanfordnlp和您的例句获得的输出.它不是一个漂亮的DependencyGraph对象,但是不幸的是,当我们使用非常深入的CoreNLP工具时,情况总是如此.您将看到在节点4和5("not"和"born")之间,有一个边"neg".

Here is the output I obtained using stanfordnlp and your example sentence. It is a DependencyGraph object, not pretty, but it is unfortunately always the case when we use the very deep CoreNLP tools. You will see that between nodes 4 and 5 ('not' and 'born'), there is and edge 'neg'.

node {
  sentenceIndex: 0
  index: 1
}
node {
  sentenceIndex: 0
  index: 2
}
node {
  sentenceIndex: 0
  index: 3
}
node {
  sentenceIndex: 0
  index: 4
}
node {
  sentenceIndex: 0
  index: 5
}
node {
  sentenceIndex: 0
  index: 6
}
node {
  sentenceIndex: 0
  index: 7
}
node {
  sentenceIndex: 0
  index: 8
}
edge {
  source: 2
  target: 1
  dep: "compound"
  isExtra: false
  sourceCopy: 0
  targetCopy: 0
  language: UniversalEnglish
}
edge {
  source: 5
  target: 2
  dep: "nsubjpass"
  isExtra: false
  sourceCopy: 0
  targetCopy: 0
  language: UniversalEnglish
}
edge {
  source: 5
  target: 3
  dep: "auxpass"
  isExtra: false
  sourceCopy: 0
  targetCopy: 0
  language: UniversalEnglish
}
edge {
  source: 5
  target: 4
  dep: "neg"
  isExtra: false
  sourceCopy: 0
  targetCopy: 0
  language: UniversalEnglish
}
edge {
  source: 5
  target: 7
  dep: "nmod"
  isExtra: false
  sourceCopy: 0
  targetCopy: 0
  language: UniversalEnglish
}
edge {
  source: 5
  target: 8
  dep: "punct"
  isExtra: false
  sourceCopy: 0
  targetCopy: 0
  language: UniversalEnglish
}
edge {
  source: 7
  target: 6
  dep: "case"
  isExtra: false
  sourceCopy: 0
  targetCopy: 0
  language: UniversalEnglish
}
root: 5

---
Tokens of first sentence
word: "Barack"
pos: "NNP"
value: "Barack"
before: ""
after: " "
originalText: "Barack"
beginChar: 0
endChar: 6
tokenBeginIndex: 0
tokenEndIndex: 1
hasXmlContext: false
isNewline: false

word: "Obama"
pos: "NNP"
value: "Obama"
before: " "
after: " "
originalText: "Obama"
beginChar: 7
endChar: 12
tokenBeginIndex: 1
tokenEndIndex: 2
hasXmlContext: false
isNewline: false

word: "was"
pos: "VBD"
value: "was"
before: " "
after: " "
originalText: "was"
beginChar: 13
endChar: 16
tokenBeginIndex: 2
tokenEndIndex: 3
hasXmlContext: false
isNewline: false

word: "not"
pos: "RB"
value: "not"
before: " "
after: " "
originalText: "not"
beginChar: 17
endChar: 20
tokenBeginIndex: 3
tokenEndIndex: 4
hasXmlContext: false
isNewline: false

word: "born"
pos: "VBN"
value: "born"
before: " "
after: " "
originalText: "born"
beginChar: 21
endChar: 25
tokenBeginIndex: 4
tokenEndIndex: 5
hasXmlContext: false
isNewline: false

word: "in"
pos: "IN"
value: "in"
before: " "
after: " "
originalText: "in"
beginChar: 26
endChar: 28
tokenBeginIndex: 5
tokenEndIndex: 6
hasXmlContext: false
isNewline: false

word: "Hawaii"
pos: "NNP"
value: "Hawaii"
before: " "
after: ""
originalText: "Hawaii"
beginChar: 29
endChar: 35
tokenBeginIndex: 6
tokenEndIndex: 7
hasXmlContext: false
isNewline: false

word: "."
pos: "."
value: "."
before: ""
after: ""
originalText: "."
beginChar: 35
endChar: 36
tokenBeginIndex: 7
tokenEndIndex: 8
hasXmlContext: false
isNewline: false

2.通过NLTK软件包使用CoreNLP

我不会在这方面做详细介绍,但是如果所有其他方法都失败了,那么还有一种通过NLTK库访问CoreNLP服务器的解决方案.它确实输出否定,但是需要更多的工作来启动服务器. 此页面

2. Using CoreNLP via NLTK package

I will not go into details on this one, but there is also a solution to access the CoreNLP server via the NLTK library , if all else fails. It does output the negations, but requires a little more work to start the servers. Details on this page

我想我也可以与您分享代码,以便将DependencyGraph放入类似于stanfordnlp输出的形状的"dependency,argument1,argument2"的漂亮列表中.

I figured I could also share with you the code to get the DependencyGraph into a nice list of 'dependency, argument1, argument2' in a shape similar to what stanfordnlp outputs.

from stanfordnlp.server import CoreNLPClient

text = "Barack Obama was not born in Hawaii."

# set up the client
with CoreNLPClient(annotators=['tokenize','ssplit','pos','depparse'], timeout=60000, memory='16G') as client:
    # submit the request to the server
    ann = client.annotate(text)

    # get the first sentence
    sentence = ann.sentence[0]

    # get the dependency parse of the first sentence
    dependency_parse = sentence.basicDependencies

    #print(dir(sentence.token[0])) #to find all the attributes and methods of a Token object
    #print(dir(dependency_parse)) #to find all the attributes and methods of a DependencyGraph object
    #print(dir(dependency_parse.edge))

    #get a dictionary associating each token/node with its label
    token_dict = {}
    for i in range(0, len(sentence.token)) :
        token_dict[sentence.token[i].tokenEndIndex] = sentence.token[i].word

    #get a list of the dependencies with the words they connect
    list_dep=[]
    for i in range(0, len(dependency_parse.edge)):

        source_node = dependency_parse.edge[i].source
        source_name = token_dict[source_node]

        target_node = dependency_parse.edge[i].target
        target_name = token_dict[target_node]

        dep = dependency_parse.edge[i].dep

        list_dep.append((dep, 
            str(source_node)+'-'+source_name, 
            str(target_node)+'-'+target_name))
    print(list_dep)

输出以下

[('compound', '2-Obama', '1-Barack'), ('nsubjpass', '5-born', '2-Obama'), ('auxpass', '5-born', '3-was'), ('neg', '5-born', '4-not'), ('nmod', '5-born', '7-Hawaii'), ('punct', '5-born', '8-.'), ('case', '7-Hawaii', '6-in')]

这篇关于斯坦福在python中使用coreNLP输入依赖项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆