选择成分来解析树表示 [英] Select Constituents to parse tree representation

查看:43
本文介绍了选择成分来解析树表示的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑我们有对应于句子的跨度,

Consider we have the spans, corresponding to the sentence,

s = "Our intent is to promote the best alternative he says"
spans = [(0, 2), (0, 3), (5, 7), (5, 8), (4, 8), (3, 8), (0, 8), (8, 10)]

我删除了 (0, 3)(8, 10).

我想把括号放在上面,像这样:

I want to put brackets over, like this:

(((0  1  2)  (3  (4  ((5  6  7)  8))))  9  10)

其中 0, 1, ... , 10 是句子中单个单词的索引.

where 0, 1, ... , 10 are the indices of single-words of the sentence.

例如,如果我们只删除 "he said""我们的意图是".在这里,我们的意图是"的跨度是对应于(0, 3),他说"的跨度为对应于 (8, 10).括号形式的最终树应如下所示:

For instance, if we were to remove ONLY "he says" and "Our intent is". Here, the span of "Our intent is" corresponds to (0, 3), and the span of "he says" corresponds to (8, 10). Our final tree in bracketed form should look like this:

"(ROOT (S (S (S (S (S Our) (S intent)) (S is) (S (S to) (S (S promote) (S (S (S the) (S best)) (S alternative)))))(S he) (S says))))")

另一个例子,如果我们要删除 ONLY to促进最佳替代方案"我们的意图是促进最佳替代方案",我们的最终括号形式的树应如下所示:

Another instance, if we were to remove ONLY "to promote the best alternative", and "Our intent is to promote the best alternative", Our final tree in bracketed form should look like this:

"(ROOT (S (S (S Our) (S intent)) (S is)) (S to) (S (S promote) (S (S (S the) (S best)) (S alternative))) (S (S he) (S says)))"

我们可以假设完整的句子我们的意图是推广他所说的最佳替代方案".永远不会被删除.这对于句子中的单个单词也是正确的,只是为了给你一个背景.

We can assume that the full-sentence "Our intent is to promote the best alternative he says" will NEVER be deleted. This is also TRUE for single-words in the sentence, just to give you a background.

我正在寻找一种可以实现其中一个/两个的方法

I am looking for a way in which we can achieve either/both of

  1. 在给定跨度的索引上的括号表示.
  2. 带起始符号S"的括号字符串树表示;表示非终端节点.

推荐答案

假设 spans 是按预先顺序给出的(遍历树时),我会做一个反向迭代(按顺序,以反向访问孩子命令).当一个跨度与之前访问过的跨度没有重叠时,则它们代表兄弟姐妹,否则它们具有父子关系.这可用于在递归算法中引导递归.还有一个循环允许任意数量的子节点(树不一定是二元树):

Assuming that the spans are given in pre-order (when traversing the tree), I would do a reverse iteration (in-order, with children visited in reversed order). When a span has no overlap with the previously visited span, then they represent siblings, otherwise they have a parent-child relationship. This can be used to steer the recursion in a recursive algorithm. There is also a loop to allow for an arbitrarily number of children (the tree is not necessarily binary):

def to_tree(phrase, spans):
    words = phrase.split()
    iter = reversed(spans)
    current = None

    def encode(start, end):
        return ["(S {})".format(words[k]) for k in range(end - 1, start - 1, -1)]
        
    def recur(start, end, tab=""):
        nonlocal current
        nodes = []
        current = next(iter, None)
        while current and current[1] > start:  # overlap => it's a child
            child_start, child_end = current
            assert child_start >= start and child_end <= end, "Invalid spans"
            # encode what comes at the right of this child (single words):
            nodes.extend(encode(child_end, end))
            # encode the child itself using recursion
            nodes.append(recur(child_start, child_end, tab+"  "))
            end = child_start
        nodes.extend(encode(start, end))
        return "(S {})".format(" ".join(reversed(nodes))) if len(nodes) > 1 else nodes[0]

    return "(ROOT {})".format(recur(0, len(words)))

你会这样称呼它:

phrase = "Our intent is to promote the best alternative he says"
spans = [(0, 2), (0, 3), (5, 7), (5, 8), (4, 8), (3, 8), (0, 8), (8, 10)]
print(to_tree(phrase, spans))

对于您提供的示例,输出并不完全相同.这段代码永远不会产生像 (S (S ... )) 这样的嵌套,它代表一个只有一个子节点的节点.在这种情况下,此代码将仅生成一层 (S ... ).另一方面,根总是以包裹所有其他节点的 (S ... ) 开始.

The output is not exactly the same for the examples you have given. This code will never produce a nested like (S (S ... )), which would represent a node with exactly one child. In that case this code will just generate one level (S ... ). On the other hand, the root will always start out with a (S ... ) that wraps all other nodes.

这篇关于选择成分来解析树表示的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆