如何在Python中找到两个单词之间的最短依赖路径? [英] How to find the shortest dependency path between two words in Python?

查看:556
本文介绍了如何在Python中找到两个单词之间的最短依赖路径?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试在给定依赖关系树的情况下找到Python中两个单词之间的依赖关系路径.

I try to find the dependency path between two words in Python given dependency tree.

句子

流行文化中的机器人在那里提醒我们 没有约束力的人类代理.

Robots in popular culture are there to remind us of the awesomeness of unbound human agency.

我使用了practnlptools( https://github.com/biplab-iitb/practNLPTools )得到如下的依赖项解析结果:

I used practnlptools (https://github.com/biplab-iitb/practNLPTools) to get the dependency parsing result like:

nsubj(are-5, Robots-1)
xsubj(remind-8, Robots-1)
amod(culture-4, popular-3)
prep_in(Robots-1, culture-4)
root(ROOT-0, are-5)
advmod(are-5, there-6)
aux(remind-8, to-7)
xcomp(are-5, remind-8)
dobj(remind-8, us-9)
det(awesomeness-12, the-11)
prep_of(remind-8, awesomeness-12)
amod(agency-16, unbound-14)
amod(agency-16, human-15)
prep_of(awesomeness-12, agency-16)

,也可以将其可视化为(图片来自 https://demos.explosion.ai/displacy/)

which can also be visualized as (picture taken from https://demos.explosion.ai/displacy/)

机器人"和"are"之间的路径长度为1,机器人"和"awesomeness"之间的路径长度为4.

The path length between "robots" and "are" is 1, the path length between "robots" and "awesomeness" would be 4.

我的问题在依赖关系解析结果上方给出,我如何获得两个单词之间的依赖关系路径或依赖关系路径长度?

My question is given above dependency parse result, how can I get dependency path or dependency path length between two words?

从我当前的搜索结果中,nltk的ParentedTree是否有帮助?

From my current search result, would nltk's ParentedTree help?

谢谢!

推荐答案

您的问题很容易被认为是图问题,我们必须找到两个节点之间的最短路径.

Your problem can easily be conceived as a graph problem where we have to find the shortest path between two nodes.

要在图形中转换依赖项解析,我们首先必须处理它作为字符串出现的事实.您想得到这个:

To convert your dependency parse in a graph, we first have to deal with the fact that it comes as a string. You want to get this:

'nsubj(are-5, Robots-1)\nxsubj(remind-8, Robots-1)\namod(culture-4, popular-3)\nprep_in(Robots-1, culture-4)\nroot(ROOT-0, are-5)\nadvmod(are-5, there-6)\naux(remind-8, to-7)\nxcomp(are-5, remind-8)\ndobj(remind-8, us-9)\ndet(awesomeness-12, the-11)\nprep_of(remind-8, awesomeness-12)\namod(agency-16, unbound-14)\namod(agency-16, human-15)\nprep_of(awesomeness-12, agency-16)'

看起来像这样:

[('are-5', 'Robots-1'), ('remind-8', 'Robots-1'), ('culture-4', 'popular-3'), ('Robots-1', 'culture-4'), ('ROOT-0', 'are-5'), ('are-5', 'there-6'), ('remind-8', 'to-7'), ('are-5', 'remind-8'), ('remind-8', 'us-9'), ('awesomeness-12', 'the-11'), ('remind-8', 'awesomeness-12'), ('agency-16', 'unbound-14'), ('agency-16', 'human-15'), ('awesomeness-12', 'agency-16')]

通过这种方式,您可以从 networkx 模块中将元组列表提供给图构造函数,该模块将进行分析列表并为您构建图形,再加上一个整洁的方法,该方法可以为您提供两个给定节点之间的最短路径的长度.

This way you can feed the tuple list to a graph constructor from the networkx module that will analyze the list and build a graph for you, plus give you a neat method that gives you the length of the shortest path between two given nodes.

必要进口

import re
import networkx as nx
from practnlptools.tools import Annotator

如何以所需的元组列表格式获取字符串

annotator = Annotator()
text = """Robots in popular culture are there to remind us of the awesomeness of unbound human agency."""
dep_parse = annotator.getAnnotations(text, dep_parse=True)['dep_parse']

dp_list = dep_parse.split('\n')
pattern = re.compile(r'.+?\((.+?), (.+?)\)')
edges = []
for dep in dp_list:
    m = pattern.search(dep)
    edges.append((m.group(1), m.group(2)))

如何构建图形

graph = nx.Graph(edges)  # Well that was easy

如何计算最短路径长度

print(nx.shortest_path_length(graph, source='Robots-1', target='awesomeness-12'))

此脚本将揭示给定依赖项解析的最短路径实际上是长度2,因为您可以通过通过remind-8

This script will reveal that the shortest path given the dependency parse is actually of length 2, since you can get from Robots-1 to awesomeness-12 by going through remind-8

1. xsubj(remind-8, Robots-1) 
2. prep_of(remind-8, awesomeness-12)

如果您不喜欢此结果,则可能要考虑过滤某些依赖项,在这种情况下,不允许将xsubj依赖项添加到图形中.

If you don't like this result, you might want to think about filtering some dependencies, in this case not allow the xsubj dependency to be added to the graph.

这篇关于如何在Python中找到两个单词之间的最短依赖路径?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆