查找分配时间内的排列的所有匹配项 [英] Find all matches of permutations within allotted time

查看:129
本文介绍了查找分配时间内的排列的所有匹配项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个程序,该程序需要9个字符,创建所有可能的排列,并为每个字符获取字典文件,然后创建一组所有可能的单词.我需要做的是将所有排列与单词进行比较,然后返回匹配项.

I'm writing a program that takes 9 characters, creates all possible permutations, and grabs a dictionary files for each character and then creates a set of all possible words. What I need to do is compare all permutations to words and return matches.

import os, itertools

def parsed(choices): 
    mySet = set()
    location = os.getcwd()
    for item in choices: 
        filename = location + "\\dicts\\%s.txt" % (item)
        mySet.update(open(filename).read().splitlines())

    return mySet  

def permutations(input): 
    possibilities = []
    pospos = []   

    for x in range(3,9):
        pospos.append([''.join(i) for i in itertools.permutations(input, x)])


    for pos in pospos: 
        for i in pos: 
            possibilities.append(i)
    return possibilities

有问题的功能是这个:

def return_matches(): 
    matches = []
    words = parsed(['s','m','o','k','e', 'j', 'a', 'c', 'k'])
    pos = permutations(['s','m','o','k','e', 'j', 'a', 'c', 'k'])

    for item in pos:  
        if item in words: 
            matches.append(item)

    return matches

此代码应返回:

matches = ['a', 'om', 'ja', 'jo', ..., 'jacks', 'cokes', 'kecks', 'jokes', 'cakes', 'smoke', 'comes', 'makes', 'cameos']

如果我使此代码正常工作,则需要10到15分钟才能完成.另一方面,每次尝试在指定的时间内执行此操作,只能用5个或更少的字符来完成,否则返回错误的结果.

If I get this code to work properly, it takes 10 - 15 minutes to complete. On the other hand, every attempt at making this execute within allotted time, it can only be done with 5 or less characters or returns the wrong result.

所以我的问题是如何优化此代码以在30秒内返回正确的结果.

So my question is how to optimize this code to return the right result, within 30 seconds time.

修改 http://www.mso.anu.edu.au/~ralph/OPTED/v003 这是我要从中抓取词典文件的网站.

Edit http://www.mso.anu.edu.au/~ralph/OPTED/v003 this is the website I'm scraping the dictionary files from.

推荐答案

在测试所有排列是否有效之前,浪费了RAM和时间将所有排列存储在列表中.相反,请在生成排列时对其进行测试,然后将有效的排列保存到集合中以消除重复.

It wastes RAM and time storing all the permutations in a list before you test if they're valid. Instead, test the permutations as you generate them, and save the valid ones into a set to eliminate duplicates.

由于 itertools.permutations 作品:

Duplicates are possible because of the way itertools.permutations works:

根据元素的位置(而不是元素的位置)将元素视为唯一 价值.因此,如果输入元素是唯一的,则不会重复 每个排列中的值.

Elements are treated as unique based on their position, not on their value. So if the input elements are unique, there will be no repeat values in each permutation.

您输入的单词"SMOKEJACK"包含2个K,因此每个包含K的排列都会生成两次.

Your input word "SMOKEJACK" contains 2 Ks, so every permutation containing K gets generated twice.

无论如何,这里有一些代码使用 SOWPODS 拼写英语单词列表.

Anyway, here's some code that uses the SOWPODS Scrabble word list for English.

from itertools import permutations

# Get all the words from the SOWPODS file
all_words = set('AI')
fname = 'scrabble_wordlist_sowpods.txt'
with open(fname) as f:
    all_words.update(f.read().splitlines())

print(len(all_words))

choices = 'SMOKEJACK'

# Generate all permutations of `choices` from length 3 to 8 
# and save them in a set to eliminate duplicates.
matches = set()
for n in range(3, 9):
    for t in permutations(choices, n):
        s = ''.join(t)
        if s in all_words:
            matches.add(s)

for i, s in enumerate(sorted(matches)):
    print('{:3} {}'.format(i, s))

输出

216555
  0 ACE
  1 ACES
  2 ACME
  3 ACMES
  4 AESC
  5 AKE
  6 AKES
  7 AMOK
  8 AMOKS
  9 ASK
 10 CAKE
 11 CAKES
 12 CAM
 13 CAME
 14 CAMEO
 15 CAMEOS
 16 CAMES
 17 CAMS
 18 CASE
 19 CASK
 20 CEAS
 21 COKE
 22 COKES
 23 COMA
 24 COMAE
 25 COMAKE
 26 COMAKES
 27 COMAS
 28 COME
 29 COMES
 30 COMS
 31 COS
 32 COSE
 33 COSMEA
 34 EAS
 35 EKKA
 36 EKKAS
 37 EMS
 38 JACK
 39 JACKS
 40 JAK
 41 JAKE
 42 JAKES
 43 JAKS
 44 JAM
 45 JAMES
 46 JAMS
 47 JOCK
 48 JOCKS
 49 JOE
 50 JOES
 51 JOKE
 52 JOKES
 53 KAE
 54 KAES
 55 KAM
 56 KAME
 57 KAMES
 58 KAS
 59 KEA
 60 KEAS
 61 KECK
 62 KECKS
 63 KEKS
 64 KOA
 65 KOAS
 66 KOS
 67 MAC
 68 MACE
 69 MACES
 70 MACK
 71 MACKS
 72 MACS
 73 MAE
 74 MAES
 75 MAK
 76 MAKE
 77 MAKES
 78 MAKO
 79 MAKOS
 80 MAKS
 81 MAS
 82 MASE
 83 MASK
 84 MES
 85 MESA
 86 MOA
 87 MOAS
 88 MOC
 89 MOCK
 90 MOCKS
 91 MOCS
 92 MOE
 93 MOES
 94 MOKE
 95 MOKES
 96 MOS
 97 MOSE
 98 MOSK
 99 OAK
100 OAKS
101 OCA
102 OCAS
103 OES
104 OKA
105 OKAS
106 OKE
107 OKES
108 OMS
109 OSE
110 SAC
111 SACK
112 SAE
113 SAKE
114 SAM
115 SAME
116 SAMEK
117 SCAM
118 SEA
119 SEAM
120 SEC
121 SECO
122 SKA
123 SKEO
124 SMA
125 SMACK
126 SMOCK
127 SMOKE
128 SOAK
129 SOC
130 SOCA
131 SOCK
132 SOJA
133 SOKE
134 SOMA
135 SOME

此代码在我相当古老的32位2GHz机器上(在Linux上运行Python 3.6.0)运行约2.5秒.在Python 2上,速度稍快(因为Python2字符串是ASCII,而不是Unicode).

This code runs in around 2.5 seconds on my rather ancient 32 bit 2GHz machine running Python 3.6.0 on Linux. It's slightly faster on Python 2 (since Python2 strings are ASCII, not Unicode).

这篇关于查找分配时间内的排列的所有匹配项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆