备忘录如何应用于此算法? [英] How can memoization be applied to this algorithm?

查看:80
本文介绍了备忘录如何应用于此算法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Python的标准库中发现difflib.SequenceMatcher类不适合我的需求后,编写了一个通用的"diff" -ing模块来解决问题空间.在经过几个月的思考后,递归算法似乎正在按需要搜索更多内容,方法是按照一个单独的搜索线程"可能也已经检查过的顺序重新搜索相同的区域.

After finding the difflib.SequenceMatcher class in Python's standard library to be unsuitable for my needs, a generic "diff"-ing module was written to solve a problem space. After having several months to think more about what it is doing, the recursive algorithm appears to be searching more than in needs to by re-searching the same areas in a sequence that a separate "search thread" may have also examined.

diff模块的目的是计算一对序列(列表,元组,字符串,字节,字节数组等)之间的差异和相似性.初始版本比代码的当前形式慢很多,速度提高了十倍.备忘录如何应用于以下代码?重写算法以进一步提高任何可能的速度的最佳方法是什么?

The purpose of the diff module is to compute the difference and similarities between a pair of sequences (list, tuple, string, bytes, bytearray, et cetera). The initial version was much slower than the code's current form, having seen a speed increase by a factor of ten. How can memoization be applied to the following code? What is the best way to rewrite the algorithm to further increase any possible speed?

class Slice:

    __slots__ = 'prefix', 'root', 'suffix'

    def __init__(self, prefix, root, suffix):
        self.prefix = prefix
        self.root = root
        self.suffix = suffix

################################################################################

class Match:

    __slots__ = 'a', 'b', 'prefix', 'suffix', 'value'

    def __init__(self, a, b, prefix, suffix, value):
        self.a = a
        self.b = b
        self.prefix = prefix
        self.suffix = suffix
        self.value = value

################################################################################

class Tree:

    __slots__ = 'nodes', 'index', 'value'

    def __init__(self, nodes, index, value):
        self.nodes = nodes
        self.index = index
        self.value = value

################################################################################

def search(a, b):
    # Initialize startup variables.
    nodes, index = [], []
    a_size, b_size = len(a), len(b)
    # Begin to slice the sequences.
    for size in range(min(a_size, b_size), 0, -1):
        for a_addr in range(a_size - size + 1):
            # Slice "a" at address and end.
            a_term = a_addr + size
            a_root = a[a_addr:a_term]
            for b_addr in range(b_size - size + 1):
                # Slice "b" at address and end.
                b_term = b_addr + size
                b_root = b[b_addr:b_term]
                # Find out if slices are equal.
                if a_root == b_root:
                    # Create prefix tree to search.
                    a_pref, b_pref = a[:a_addr], b[:b_addr]
                    p_tree = search(a_pref, b_pref)
                    # Create suffix tree to search.
                    a_suff, b_suff = a[a_term:], b[b_term:]
                    s_tree = search(a_suff, b_suff)
                    # Make completed slice objects.
                    a_slic = Slice(a_pref, a_root, a_suff)
                    b_slic = Slice(b_pref, b_root, b_suff)
                    # Finish the match calculation.
                    value = size + p_tree.value + s_tree.value
                    match = Match(a_slic, b_slic, p_tree, s_tree, value)
                    # Append results to tree lists.
                    nodes.append(match)
                    index.append(value)
        # Return largest matches found.
        if nodes:
            return Tree(nodes, index, max(index))
    # Give caller null tree object.
    return Tree(nodes, index, 0)


参考:如何优化递归算法以免重复自身?

推荐答案

正如〜unutbu所说,请尝试

As ~unutbu said, try the memoized decorator and the following changes:

@memoized
def search(a, b):
    # Initialize startup variables.
    nodes, index = [], []
    a_size, b_size = len(a), len(b)
    # Begin to slice the sequences.
    for size in range(min(a_size, b_size), 0, -1):
        for a_addr in range(a_size - size + 1):
            # Slice "a" at address and end.
            a_term = a_addr + size
            a_root = list(a)[a_addr:a_term] #change to list
            for b_addr in range(b_size - size + 1):
                # Slice "b" at address and end.
                b_term = b_addr + size
                b_root = list(b)[b_addr:b_term] #change to list
                # Find out if slices are equal.
                if a_root == b_root:
                    # Create prefix tree to search.
                    a_pref, b_pref = list(a)[:a_addr], list(b)[:b_addr]
                    p_tree = search(a_pref, b_pref)
                    # Create suffix tree to search.
                    a_suff, b_suff = list(a)[a_term:], list(b)[b_term:]
                    s_tree = search(a_suff, b_suff)
                    # Make completed slice objects.
                    a_slic = Slice(a_pref, a_root, a_suff)
                    b_slic = Slice(b_pref, b_root, b_suff)
                    # Finish the match calculation.
                    value = size + p_tree.value + s_tree.value
                    match = Match(a_slic, b_slic, p_tree, s_tree, value)
                    # Append results to tree lists.
                    nodes.append(match)
                    index.append(value)
        # Return largest matches found.
        if nodes:
            return Tree(nodes, index, max(index))
    # Give caller null tree object.
    return Tree(nodes, index, 0)

为便于记忆,词典是最好的,但不能切成薄片,因此必须将其更改为上面评论中指示的列表.

For memoization, dictionaries are best, but they cannot be sliced, so they have to be changed to lists as indicated in the comments above.

这篇关于备忘录如何应用于此算法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆