难题解决:在PHP中查找较大词中的所有词 [英] Puzzle Solving: Finding All Words Within a Larger Word in PHP

查看:119
本文介绍了难题解决:在PHP中查找较大词中的所有词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我有一个3到20个字符长的单词数据库.我想用PHP编写一些代码,以查找包含在较大单词中的所有较小单词.例如,在向内"一词中,有雨",赢",骑"等词.

So I have a database of words between 3 and 20 characters long. I want to code something in PHP that finds all of the smaller words that are contained within a larger word. For example, in the word "inward" there are the words "rain", "win", "rid", etc.

起初,我考虑过在Words表中添加一个字段(Words3至Words20,表示单词中字母的数量),例如"LetterCount"……例如,"rally"将表示为10000000000200000100000010:字母A的1个实例,字母B的0个实例,...字母L的2个实例,依此类推.然后,遍历每个表中的所有单词(如果指定了找到的单词的目标长度,则遍历一个表)并将每个单词的LetterCount与源单词的LetterCount(在上面的示例中为向内")进行比较.

At first I thought about adding a field to the Words tables (Words3 through Words20, denoting the number of letters in the words), something like "LetterCount"... for example, "rally" would be represented as 10000000000200000100000010: 1 instances of the letter A, 0 instances of the letter B, ... 2 instances of the letter L, etc. Then, go through all the words in each table (or one table if the target length of found words was specified) and compare the LetterCount of each word to the LetterCount of the source word ("inward" in the example above).

但是后来我开始考虑到,这将给MySQL数据库和PHP脚本造成太大的负担,调用每个单词的LetterCount,将每个数字与源单词的数字进行比较,等等.

But then I started thinking that that would place too much of a load on the MySQL database as well as the PHP script, calling each and every word's LetterCount, comparing each and every digit to that of the source word, etc.

是否有更简单,更直观的方法?如果对存储过程有任何帮助,我愿意使用存储过程.只是一些建议,将不胜感激.谢谢!

Is there an easier, perhaps more intuitive way of doing this? I'm open to using stored procedures if it will help with overhead in any way. Just some suggestions would be greatly appreciated. Thanks!

推荐答案

这是一个简单的解决方案,应该非常有效,但只能使用一定大小的单词(大概会分解15到20个字符,取决于组成单词的字母是值较低的低频字母还是值较高的高频字母)

Here is a simple solution that should be pretty efficient, but will only work up to certain size of words (probably about 15-20 characters it will break down, depending on whether the letters making up the word are low-frequency letters with lower values or high-frequency letters with higher values):

  1. 根据字母的频率为每个字母分配一个素数.所以e是2,t = 3,a = 5等等,使用此处,它是anitinstitutionalism,其值为6901041299724096525,几乎不能容纳在bigint列中.但是,由14个字母组成的单词xylopyrography的值635285791503081662905太大.您可能需要使用替代方法来处理非常大的特殊情况,但希望其中的少数几个仍然相对有效.
  1. Assign each letter a prime number according to it's frequency. So e is 2, t = 3, a = 5, etc. using frequency values from here or some similar source.
  2. Precalculate the value of each word in your word list by multiplying the prime values for the letters in the word, and store in the table in a bigint data type column. For instance, tea would have a value of 3*2*5=30. If a word has repeated letters, repeat the factor, so that teat should have a value of 3*2*5*3=90.
  3. When checking if a word, such as rain, is contained inside of another word, such as inward, it's sufficient to check if the value for rain divides the value for inward. In this case, inward = 14213045, rain = 7315, and 14213045 is divisible by 7315, so the word rain is inside the word inward.
  4. A bigint column maxes out at 9223372036854775807, which should be fine up to about 15-20 characters (depending on the frequencies of letters in the word). For instance, I picked up the first 20-letter word from here, which is anitinstitutionalism, and has a value of 6901041299724096525 which would just barely fit inside the bigint column. However, the 14-letter word xylopyrography has a value of 635285791503081662905, which is too big. You might have to handle the really large ones as special cases using an alternate method, but hopefully there's few enough of them that it would still be relatively efficient.

该查询的工作方式类似于我在此处准备的演示程序: http://www.sqlfiddle.com/#!2/9bd27/8

The query would work something like the demo I've prepared here: http://www.sqlfiddle.com/#!2/9bd27/8

这篇关于难题解决:在PHP中查找较大词中的所有词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆