这段python代码可以更有效吗? [英] Can this python code be more efficient?

查看:74
本文介绍了这段python代码可以更有效吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了一些代码来查找一个字符串中有多少个子字符串是字谜对.查找anagram(anagramSolution)的函数的复杂度为O(N).子字符串函数的复杂度小于N平方.但是,这里的代码就是问题所在.可以更优化吗?

I have written some code to find how many substrings of a string are anagram pairs. The function to find anagram(anagramSolution) is of complexity O(N). The substring function has complexity less than N square. But, this code here is the problem. Can it be more optimized?

for i in range(T):
    x = raw_input()
    alist = get_all_substrings(x)

    for k, j in itertools.combinations(alist,2):
        if(len(k) == len(j)):
            if(anagramSolution(k,j)):
                counter +=1

    counterlist.append(counter)
    counter = 0

alist可以具有数千个项目(子集). 主要问题是循环.对所有项目进行迭代需要花费大量时间.有没有更快或更有效的方法来做到这一点?

The alist can have thousands of items (subsets). The main problem is the loop. It is taking a lot of time to iterate over all the items. Is there any faster or more efficient way to do this?

推荐答案

定义字符串的 anagram类为每个字母在字符串中出现多少次的计数集.例如,'banana'具有字谜类a: 3, b: 1, n: 2.如果两个字符串具有相同的字谜类,则它们是彼此的字谜.我们可以计算出每个字谜类中有多少个子字符串,然后通过为每个带有n个子字谜类的anagram类计算(n choose 2)来计算对数:

Define the anagram class of a string to be the set of counts of how many times each letter appears in the string. For example, 'banana' has anagram class a: 3, b: 1, n: 2. Two strings are anagrams of each other if they have the same anagram class. We can count how many substrings of the string are in each anagram class, then compute the number of pairs by computing (n choose 2) for every anagram class with n substrings:

from collections import Counter

anagram_class_counts = Counter()

for substring in get_all_substrings(x):
    anagram_class_counts[frozenset(Counter(substring).viewitems())] += 1

anagram_pair_count = sum(x*(x-1)/2 for x in anagram_class_counts.viewvalues())

frozenset(Counter(substring).viewitems())构建字符串的anagram类的可哈希表示.

frozenset(Counter(substring).viewitems()) builds a hashable representation of a string's anagram class.

  • Counter进行迭代,并建立一个表示每个项目出现了多少次的映射,所以
  • Counter(substring)构建一个表示字符串的字谜类的映射.
  • viewitems()给出了一组类似字母的集合:计数对,和
  • frozenset会将其变成一个不可变的集合,可以用作字典键.
  • Counter takes an iterable and builds a mapping representing how many times each item appeared, so
  • Counter(substring) builds a mapping representing a string's anagram class.
  • viewitems() gives a set-like collection of letter: count pairs, and
  • frozenset turns that into an immutable set that can be used as a dict key.

这些步骤所花费的时间与子字符串的大小成正比;平均而言,子字符串大约是整个字符串大小的三分之一,因此平均而言,处理每个子字符串需要O(len(x))时间.有O(len(x)**2)个子字符串,因此处理所有子字符串需要O(len(x)**3)时间.

These steps together take time proportional to the size of the substring; on average, substrings are about a third of the size of the whole string, so on average, processing each substring takes O(len(x)) time. There are O(len(x)**2) substrings, so processing all substrings takes O(len(x)**3) time.

如果有x个子字串具有相同的字谜类,则可以用x*(x-1)/2方式将它们配对,因此sum遍历每个字谜类的出现次数并计算对数.这需要花费O(len(x)**2)的时间,因为它必须遍历每个字谜类一次,并且字谜类不能超过子字符串.

If there are x substrings with the same anagram class, they can be paired up in x*(x-1)/2 ways, so the sum goes through the number of occurrences of each anagram class and computes the number of pairs. This takes O(len(x)**2) time, since it has to go through each anagram class once, and there can't be more anagram classes than substrings.

总体而言,该算法花费O(len(x)**3)的时间,虽然不是很好,但是比原始算法要好得多.仍然有优化的空间,例如通过利用子字符串之间的重叠来计算字谜类,或者使用更有效的字谜类表示形式.

Overall, this algorithm takes O(len(x)**3) time, which isn't great, but it's a lot better than the original. There's still room to optimize this, such as by computing anagram classes in a way that takes advantage of the overlap between substrings or by using a more efficient anagram class representation.

这篇关于这段python代码可以更有效吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆