strcmp for python或在构建后缀数组时如何有效地对子字符串进行排序(无副本) [英] strcmp for python or how to sort substrings efficiently (without copy) when building a suffix array

查看:125
本文介绍了strcmp for python或在构建后缀数组时如何有效地对子字符串进行排序(无副本)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是从python中的字符串构建后缀数组的非常简单的方法:

Here's a very simple way to build an suffix array from a string in python:

def sort_offsets(a, b):
    return cmp(content[a:], content[b:])

content = "foobar baz foo"
suffix_array.sort(cmp=sort_offsets)
print suffix_array
[6, 10, 4, 8, 3, 7, 11, 0, 13, 2, 12, 1, 5, 9]

但是,"content [a:]"会复制内容,当内容变大时,效率会非常低下.所以我想知道是否有一种方法可以比较两个子字符串而不必复制它们.我试图使用内置的缓冲区,但是没有用.

However, "content[a:]" makes a copy of content, which becomes very inefficient when content gets large. So i wonder if there's a way to compare the two substrings without having to copy them. I've tried to use the buffer-builtin, but it didn't worked.

推荐答案

buffer 函数不会复制整个字符串,而是创建一个仅引用源字符串的对象.使用interjay的建议,将是:

The buffer function does not copy the whole string, but creates an object that only references the source string. Using interjay's suggestion, that would be:

suffix_array.sort(key=lambda a: buffer(content, a))

这篇关于strcmp for python或在构建后缀数组时如何有效地对子字符串进行排序(无副本)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆