如何使用python提高置换算法效率 [英] How to improve permutation algorithm efficiency with python

查看:105
本文介绍了如何使用python提高置换算法效率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文件,包含刚好超过100,000个字。我要做的是遍历字母表的每5个字母组合,以找出最少的单词使用的5个字母。



出一个最终会得到答案的基于python的程序,但是如果按照它的速度发展,它可能需要大约48小时,甚至更长的时间。问题的一部分在于纯粹的计算数量。我还没有弄清楚如何限制排列,以便仅比较不同的字符串-仅对组合进行26 5 计算,然后将每一个与100,000个单词进行比较,就可以了大约只需10 * 10 11 计算即可。



有没有办法大大加快此过程,可以通过更多高效的算法,还是多线程之类的?



对于有关算法效率的书籍/文章的任何建议也将不胜感激。






我当前的程序如下:



从itertools模块导入置换功能:



<$ p来自itertools的$ p> 导入排列

询问单词是否包含禁止使用的字母:

  def避免(单词,字母):
表示字母字符:
表示单词字符:
如果字符==字符:
返回False
返回True

计算文件中不包含禁止字符的单词数:

  def number_of_words(文件,字母): 

open_file = open(文件)

x = 0#在$ open_file中输入$ count

word = line.strip()
如果避免(单词,字母)== Tr ue:
x + = 1
回报x

每次运行5个变体字母中存在的字母,并计算不包括最少单词的组合:

  def find_smallest():

y = 0

#每个字母
的组合排列( abcdefghijklmnopqrstuvwxyz,5):
x = number_of_words( words.txt,字母)
#将y设置为x
的初始值(如果y == 0:
y = x
打印起点,组合:%s,数量:%d%(字母,y)

#如果当前组合大于所有先前组合,则将y设置为x
elif x> y:
y = x
组合=字母
重复= 0
print新最高,组合:%s,金额:%d%(字母,y)

打印%s排除了最少的单词数(%d)%(组合y)

运行程序:

  find_smallest()


解决方案

这不是提高排列效率的问题。这确实是一个有关如何使算法更智能的问题,这是一个数据结构问题。


我有一个文件,包含刚超过100,000个单词。我要做的
是通过字母表的每5个字母组合来计算的,以找出
最少使用的5个字母。




浏览26个字母,并计算列表中使用每个字母的单词数:

 导入字符串
字母= string.ascii_lowercase
计数= {k:sum(k in word.lower()for word in word)for k in Alphabet}

这应该很快,并且应该为您提供足够的信息来轻松挑选出五个最受欢迎的字母。



等效方法,可能比上述方法更有效,但可能不太清楚。

 来自itertools进口链
来自集合进口柜台
柜台=柜台({k:k为0 in string.ascii_lowercase})
counter.update(Counter(c for w in set in w(set.w.lower()))c)


I have a file that contains just over 100,000 words. What I have to do is run through every 5 letter combination of the alphabet to work out the 5 letters that are used by the least number of words.

I've worked out a python based program that will eventually get the answer, but at the speed it is going it would probably take around 48 hours if not longer. Part the problem is the sheer number of calculations. I haven't worked out how to limit permutations so that only distinct strings are compared - so that is 265 calculations just on the combinations alone, and then each of these is compared against 100,000 words, which works out at around 10*1011 calculations at a bare minimum.

Is there any way to speed this process up substantially, be that through more efficient algorithms, or multi-threading or something like that?

Any recommendations for books/articles on algorithm efficiency would also be greatly appreciated.


My current program is as follows:

Imports permutation function from itertools module:

from itertools import permutations

Asks if the word contains the forbidden letters:

def avoids (word, letters): 
    for characters in letters:
        for character in word:
            if character == characters:
                return False
    return True     

Calculates the number of words from the file that do not contain the forbidden characters:

def number_of_words(file, letters):  

    open_file = open(file)

    x = 0 #counter
    for line in open_file:
        word = line.strip()
        if avoids(word, letters) == True:
        x += 1  
    return x

Runs through every variation of five letters that exists in the alphabet and calculates the combination that excludes the fewest words:

def find_smallest():

    y = 0

    #every combination of letters
    for letters in permutations("abcdefghijklmnopqrstuvwxyz", 5): 
        x = number_of_words("words.txt", letters)
        #sets y to the initial value of x
        if y == 0:
            y = x
            print "Start point, combination: %s, amount: %d" % (letters, y)

        #If current combination is greater than all previous combinations set y to x
        elif x > y:
            y = x
            combination = letters
            duplication = 0
            print "New highest, combination: %s, amount: %d" % (letters, y)

        print "%s excludes the smallest number of words (%d)" % (combination, y)

Runs the program:

find_smallest()

解决方案

This is not a question about increasing permutation efficiency. This is really a question about how to make a smarter algorithm, it is a data structure question.

I have a file that contains just over 100,000 words. What I have to do is run through every 5 letter combination of the alphabet to work out the 5 letters that are used by the least number of words.

Loop through the 26 letters of the alphabet and count the number of words in your list which use each letter:

import string
alphabet = string.ascii_lowercase
counts = {k: sum(k in word.lower() for word in words) for k in alphabet}

This should be pretty fast, and should give you enough information to trivially pick out the five least popular letters.

Equivalent approach, probably more efficient but maybe less clear than the above.

from itertools import chain
from collections import Counter
counter = Counter({k: 0 for k in string.ascii_lowercase})
counter.update(Counter(c for w in words for c in set(w.lower())))

这篇关于如何使用python提高置换算法效率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆