编码挑战:生成字谜。 [英] Coding challenge: generate anagrams.

查看:84
本文介绍了编码挑战:生成字谜。的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

今天的挑战又回到了弦乐。



给定一个随机(或不那么随机)的字符串,生成独特的字谜。



奖励积分:加入你选择的拼写检查服务(本地机器,支持操作系统,远程网络服务),只返回实际单词的字谜。



技巧:它需要是一个快速的解决方案。如果您选择仅返回实际单词,请考虑减少拼写检查的数量。



上周



Richard Deeming获得了上周的解决方案。联系Sean,你可以成为我们新杯子设计的试验品。

解决方案

灵感来自解决方案2但速度要快得多

 计划
{
private static IEnumerable< string> GetWords()
{
// 使用本地版本:
// https://raw.githubusercontent.com/dwyl/english-words/master/words.txt

使用(StreamReader sr = File.OpenText( < span class =code-string> words.txt))
{
string s = 字符串 .Empty;
while ((s = sr.ReadLine())!= null
{
if (s.Length > 0
{
yield return s;
}
}
}
}

静态 Char [] ToCharArray( string 来源)
{
char [] CharArray = Source.ToLower()。ToCharArray ();
Array.Sort(CharArray);
return CharArray;
}

静态 void Main( string [] args)
{
秒表stopWatch = 秒表();
stopWatch.Start();

字典< Char [],List< string>> AllWords = new 字典< Char [],List< string>>( new CharArrayEqualityComparer());
foreach 字符串 GetWords())
{
List< string>词汇表;
Char [] key = ToCharArray(word);
if (!AllWords.TryGetValue(key, out WordList))
{
WordList = new List< string>();
AllWords.Add(key,WordList);
}
WordList.Add(word);
}
// ILookup< UInt16 [],字符串> AllWords = GetWords()。ToLookup(s => ToNumericArray(s),new CharArrayEqualityComparer());

stopWatch.Stop();

Console.WriteLine( string .Format( 在{1} ms内加载{0}字的查找,AllWords.Count,stopWatch.Elapsed.TotalMilliseconds));

string CompareWord;
do
{
Console.Write( string .Format( {0}搜索Word:,Environment.NewLine));
CompareWord = Console.ReadLine();

stopWatch.Reset();
stopWatch.Start();

if (CompareWord!= null
{
IEnumerable< string> Anagrams = AllWords [ToCharArray(CompareWord)];
stopWatch.Stop();
Console.WriteLine( string .Format( 经过的时间(毫秒):{0},stopWatch.Elapsed.TotalMilliseconds));

foreach string word in Anagrams)
{
Console.WriteLine(word);
}
}
} while (CompareWord!= null ) ;
}
}

内部 CharArrayEqualityComparer: EqualityComparer< Char [] >
{
private static readonly EqualityComparer< Char> Comparer = EqualityComparer< Char> .Default;
public 覆盖 bool 等于( char [] x, char [] y)
{
if (x == null return ;
if (y == null return false ;

if (x.Length!= y.Length)
{
返回 false ;
}
for int i = 0 ; i < x.Length; i ++)
{
if (!Comparer.Equals(x [i],y [i]))
{
return ;
}
}
返回 true ;
}

public 覆盖 int GetHashCode( char [] obj)
{
unchecked
{
if (obj == null 返回 0 ;
int hash = 17 ;
foreach char in obj)
{
hash =(((hash<< 5 ) - hash)^ Comparer.GetHashCode(Item));
}
return hash;
}
}
}



结果:

在515,3337ms内查找315379个单词)

搜索词:teaser
已用时间(毫秒):0,0082
个赌注
asteer
复活节
eaters
reseat
saeter
seater
staree
teaser

搜索词:alerts
时间流逝(毫秒):0,0082
提醒
改变
artels
estral
laster
lastre
rastle
ratels
relast
resalt
salter
slater
staler
stelar
talers



<罢工>我怀疑大部分测量的时间都是把文字写到控制台。

改变代码的方式与Graeme的做法相同



我设法通过跳过字符串并使用带有自制EqualityComparer的int数组来进一步削减它



< edit>因为使用查找简化了问题,谁拥有最快的哈希算法thm。

我以为我会看看如何通过创建单词作为字谜来更进一步。使用类似于以下方法的方法可以很容易地创建它们:

  public   static  IEnumerable< IEnumerable< T>> CreateCombinations< T>( IEnumerable< T>元素, int  Ordinal)
{
return Ordinal == 0 new [] { new T [ 0 ]}:
elements.SelectMany((e,i)= >
elements.Skip(i + 1 )。CreateCombinations(Ordinal - 1 )。选择(c = > new [] {e})。Concat(c)));
}

问题只是创建查找表的时间方面。如果创建N个单词的查找需要1秒钟,则需要N-1秒来创建仅包含两个单词的组合的查找,其中三个单词将花费(N-1)*(N-2)秒并且等等。所以我决定让它休息一下,直到我想出一个更快的方式来创建查询表< / edit>



< edit2>只是为了证明查找性能没有区别我使用字典为ILookup创建了dropin替换。因此,负载性能变得更好,因为它不再使用linq。

我还修剪了equalityComparer,因此它不再包含任何乘法。 < / edit2>



< edit2>意识到我可以使用EqualityComparer< char>在我用于字典的EqualityComparer里面。

现在我不需要复制数组< / edit2>


给定一个单词数据库,以及对字符串中的字符进行排序的函数...



SELECT [word] FROM [字典] WHERE排序([word]) = sort([inputstring])



:笑:



< br $>
编辑:



使用SQL Server,我有一个216634字的列表,其字母已预先排序。我还没有添加索引。以下内容在不到一秒的时间内完成。



  DECLARE   @ i   VARCHAR  16 )= '  teaser' 
DECLARE @ s VARCHAR 16 )= dbo.SortLetters( @ i
SELECT [Word] FROM [WordList] WHERE [Letters] = @ s AND [Word]!= @ i ORDER BY [Word]





 ARETES 
EASTER
EATERS
价格
RESEAT
SAETER
SEATER
STEARE







编辑:



从其他人提到的网站上添加words.txt和words3.txt

(并忽略单词 这不是严格的字母表,总共463260个单词...



 ARETES 
ASTEER
EASTER
EASTRE
EATERS
ERASTE
REATES
RESEAT
RESETA
SAETER
SEATER
STAREE
STEARE
TERESA







编辑:



我记得OpenVMS有一个单词列表,所以我从我的一个系统中提取了它并将它添加到我的主列表中(现在总共有五个列表)。



它包含42979个条目,其中2973个不在任何其他列表中(主要是名称,如Boromir),包括:



< pre lang =text> CARDIOVASCULATORY
DJANGOLOGY
DOPPLEGANGER





An d这些:



 ARBRITRARY 
ATACACTIVE
COWRTIERS
DOCTERS
EXPECIALLY





(我没有全部审查。)

可悲的是,它没有包含预告片的新字谜。





编辑:



从主人那里删除重复的单词后列表,我有466224个单词。

我看到有人发布了'警报'的字谜,所以这就是我的数据库:



< pre lang =text> Word HowMany哪个
ALTERS 5 31
ARTELS 4 15
ESTRAL 4 15
LASTER 4 15
LASTRE 2 12
RASTLE 2 12
RATELS 4 15
RELAST 2 12
RESALT 2 12
SALTER 5 31
SLATER 5 31
STALER 4 15
STELAR 4 15
TALERS 4 15
TARSEL 1 2





HowMany列表示有多少输入列表包含字。
哪一列是一个位映射值,表示哪些输入列表包含它。


Python



从命令行中获取一个单词并返回所有单词的字谜。

它的工作方式:

  1. 它使用Python的 itertools 生成所有排列。这可能是最快的方法。
  2. 检查单词是否真实的最快方法应该是检查它们是否存在于加载到内存中的集合中。然后,您不需要读取文件系统,也不需要发送HTTP请求,这可能需要一些时间。我在源代码中包含了一个完整的单词列表,来自我的Ubuntu虚拟机上的/ usr / share / dict / words。但是,该文件大约为900kB,这是非常大的。出于这个原因,我没有在源代码中包含原始单词,但我zlib压缩了它们并对这种压缩进行了base64编码,这是仅333kB(仍然很大,但更可接受 - 速度是一个问题,内存和源长度不是:))。该字符串被放入源代码中,并在程序启动时解压缩并创建一个Python set ,检查潜在的字谜。





这是代码:

CodeProject编码挑战的完整字谜代码·GitHub [ ^ ]

没有巨大Base64字符串的代码:

  import  sys,base64,zlib,itertools 

def main():
input_word = sys.argv [ 1 ]
perms = set([' '。在 itertools.permutations(input_word)中加入(x) x
for
w in perms:
if is_word(w):
print (w)

word_compressed_base64 = 我不打算在这里复制一些巨大的东西;请参阅Gist了解完整代码

word_set =

def prepare_words_set():
global word_set
decoding = base64.standard_b64decode(word_compressed_base64)
decompressed = zlib.decompress(已解码).decode(' utf-8'
word_set = set (decompressed.split(' ;'))

def is_word(w):
返回 w word_set

prepare_words_set()
main()


Today's challenge is back to strings.

Given a random (or not so random) string, generate unique anagrams.

Bonus points: Hook into a spell checking service of your choice (local machine, backed into the OS, remote webservice) to only return anagrams that are actual words.

The trick: it needs to be a fast solution. Think of ways to reduce the number of spell checks needed if you choose to return only actual words.

Last Week

Richard Deeming gets the mug for last week's solution. Contact Sean and you can be a guinea pig for our new mug design.

解决方案

Inspired by Solution 2 but considerably faster

class Program
{
    private static IEnumerable<string> GetWords()
    {
        //using a local version of:
        //https://raw.githubusercontent.com/dwyl/english-words/master/words.txt

        using (StreamReader sr = File.OpenText("words.txt"))
        {
            string s = String.Empty;
            while ((s = sr.ReadLine()) != null)
            {
                if (s.Length > 0)
                {
                    yield return s;
                }
            }
        }
    }

    static Char[] ToCharArray(string Source)
    {
        char[] CharArray = Source.ToLower().ToCharArray();
        Array.Sort(CharArray);
        return CharArray;
    }

    static void Main(string[] args)
    {
        Stopwatch stopWatch = new Stopwatch();
        stopWatch.Start();

        Dictionary<Char[], List<string>> AllWords = new Dictionary<Char[], List<string>>(new CharArrayEqualityComparer());
        foreach (string word in GetWords())
        {
            List<string> WordList;
            Char[] key = ToCharArray(word);
            if (!AllWords.TryGetValue(key, out WordList))
            {
                WordList = new List<string>();
                AllWords.Add(key, WordList);
            }
            WordList.Add(word);
        }
        //ILookup<UInt16[], string> AllWords = GetWords().ToLookup(s => ToNumericArray(s), new CharArrayEqualityComparer());

        stopWatch.Stop();

        Console.WriteLine(string.Format("Lookup for {0} words loaded in {1}ms)", AllWords.Count, stopWatch.Elapsed.TotalMilliseconds));

        string CompareWord;
        do
        {
            Console.Write(string.Format("{0}Search Word: ", Environment.NewLine));
            CompareWord = Console.ReadLine();

            stopWatch.Reset();
            stopWatch.Start();

            if (CompareWord != null)
            {
                IEnumerable<string> Anagrams = AllWords[ToCharArray(CompareWord)];
                stopWatch.Stop();
                Console.WriteLine(string.Format("Time elapsed (ms): {0}", stopWatch.Elapsed.TotalMilliseconds));

                foreach (string word in Anagrams)
                {
                    Console.WriteLine(word);
                }
            }
        } while (CompareWord != null);
    }
}

internal class CharArrayEqualityComparer : EqualityComparer<Char[]>
{
    private static readonly EqualityComparer<Char> Comparer = EqualityComparer<Char>.Default;
    public override bool Equals(char[] x, char[] y)
    {
        if (x == null) return false;
        if (y == null) return false;

        if (x.Length != y.Length)
        {
            return false;
        }
        for (int i = 0; i < x.Length; i++)
        {
            if(!Comparer.Equals(x[i], y[i]))
            {
                return false;
            }
        }
        return true;
    }

    public override int GetHashCode(char[] obj)
    {
        unchecked
        {
            if (obj == null) return 0;
            int hash = 17;
            foreach (char Item in obj)
            {
                hash = (((hash << 5) - hash) ^ Comparer.GetHashCode(Item));
            }
            return hash;
        }
    }
}


And the results:

Lookup for 315379 words loaded in 515,3337ms)

Search Word: teaser
Time elapsed (ms): 0,0082
aretes
asteer
easter
eaters
reseat
saeter
seater
staree
teaser

Search Word: alerts
Time elapsed (ms): 0,0082
alerts
alters
artels
estral
laster
lastre
rastle
ratels
relast
resalt
salter
slater
staler
stelar
talers


I suspect most of the measured time is writing out the words to the console.
Changed the code to measure the same way as Graeme is doing

I managed to trim it a bit further by skipping the strings and using int arrays with a homemade EqualityComparer

<edit>Since using a lookup simplifies the problem to, who's having the fastest hash algorithm.
I thought I'd have a look at how to take it a step further by creating combinations of words as anagrams. They could be created fairly easy by using a method similar to:

public static IEnumerable<IEnumerable<T>> CreateCombinations<T>(this IEnumerable<T> elements, int Ordinal)
{
    return Ordinal == 0 ? new[] { new T[0] } :
      elements.SelectMany((e, i) =>
        elements.Skip(i + 1).CreateCombinations(Ordinal - 1).Select(c => (new[] { e }).Concat(c)));
}

The problem is just the time aspect of creating the lookup table. If it takes 1 second to create a lookup for N words it will take N-1 seconds to create a lookup with a combination of just two words, with three words it will take (N-1)*(N-2) seconds and so on. So I decided I'll give it a rest until I come up with a faster way of creating the lookup table</edit>

<edit2>Just to prove there's no difference in lookup performance I created a dropin replacement for the ILookup using a dictionary. As a result the load performance got quite better since it doesn't use linq any more.
I've also trimmed the equalityComparer so it doesn't contain any multiplications anymore. </edit2>

<edit2>Realized that I can use an EqualityComparer<char> inside the EqualityComparer I'm using for the dictionary.
Now I don't need to copy the array</edit2>


Given a database of words, and a function to sort the characters in a string...

SELECT [word] FROM [dictionary] WHERE sort([word])=sort([inputstring])

:laugh:


Edit:

Using SQL Server, I have a list of 216634 words, with their letters pre-sorted. I have not yet added an index. The following completes in less than a second.

DECLARE @i VARCHAR(16) = 'teaser'
DECLARE @s VARCHAR(16) = dbo.SortLetters ( @i )
SELECT [Word] FROM [WordList] WHERE [Letters] = @s AND [Word] != @i ORDER BY [Word]



ARETES
EASTER
EATERS
REATES
RESEAT
SAETER
SEATER
STEARE




Edit:

After adding words.txt and words3.txt from the site that others have mentioned
(and ignoring the "words" that are not strictly alphabetic) for a total of 463260 words...

ARETES
ASTEER
EASTER
EASTRE
EATERS
ERASTE
REATES
RESEAT
RESETA
SAETER
SEATER
STAREE
STEARE
TERESA




Edit:

I remembered that OpenVMS has a word list, so I extracted that from one of my systems and added it to my master list (now a total of five lists).

It contains 42979 entries, 2973 of which are not on any of the other lists (mostly names, such as Boromir), including these:

CARDIOVASCULATORY
DJANGOLOGY
DOPPLEGANGER



And these:

ARBRITRARY
ATTACTIVE
COWRTIERS
DOCTERS
EXPECIALLY



(I didn't review them all.)
Sadly, it contains no new anagrams of "teaser".


Edit:

After removing duplicate words from my master list, I have 466224 words.
I see that someone posted anagrams of 'alerts', so here's what my database has:

Word   HowMany Which
ALTERS 5       31
ARTELS 4       15    
ESTRAL 4       15
LASTER 4       15
LASTRE 2       12
RASTLE 2       12
RATELS 4       15
RELAST 2       12
RESALT 2       12
SALTER 5       31
SLATER 5       31   
STALER 4       15   
STELAR 4       15   
TALERS 4       15    
TARSEL 1       2



The HowMany column indicates how many of the input lists contain the word.
The Which column is a bit-mapped value indicating which input lists contain it.


Python


Takes a word from the command-line and returns all anagrams that are words.
The way it works:

  1. It uses Python's itertools to generate all permutations. This is likely the fastest way to do so.
  2. The fastest way to check if words are real, should be to check their existence in a set loaded into memory. Then you don't need to read the file system and you don't need to send a HTTP request, which can take some time. I included a full word list in the source, from /usr/share/dict/words on my Ubuntu virtual machine. However, that file is ~900kB, which is pretty big. For this reason I did not include the 'raw' words in the source, but I zlib-compressed them and base64-encoded this compression, which is 'only' 333kB (still big, but more acceptable - and speed was a concern, memory and source length were not :) ). That string is put into the source code, and on the startup of the program, it decompresses it and created a Python set, that potential anagrams are checked against.



This is the code:
Full anagrams code for CodeProject coding challenge · GitHub[^]
Code without the huge Base64 string:

import sys, base64, zlib, itertools

def main():
    input_word = sys.argv[1]
    perms = set([''.join(x) for x in itertools.permutations(input_word)])
    for w in perms:
        if is_word(w):
            print(w)

word_compressed_base64 = "some huge thing that I'm not going to copy here; see Gist for full code"

word_set = None

def prepare_words_set():
    global word_set
    decoded = base64.standard_b64decode(word_compressed_base64)
    decompressed = zlib.decompress(decoded).decode('utf-8')
    word_set = set(decompressed.split(';'))

def is_word(w):
    return w in word_set

prepare_words_set()
main()


这篇关于编码挑战:生成字谜。的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆