正则表达式拼写错误 [英] Regex misspellings

查看:53
本文介绍了正则表达式拼写错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从数据库中的列表创建了一个正则表达式,用于匹配游戏中建筑物类型的名称.问题是打字错误,有时那些在游戏中为他们的团队编写说明的人会拼错建筑物名称,显然正则表达式将无法识别(即拼写大学"和大学").

对于 1 个或 2 个字母的拼写错误是否有任何建议?

正则表达式是动态生成的,并在能够处理更多负载的本地机器上运行,所以作为最后的手段,我不得不通过算法创建每个单词的版本,其中缺少一个字母,然后另一个添加字母.

我正在使用 PHP,但我希望针对此问题的任何解决方案都不是特定于 PHP 的.

解决方案

请允许我向您介绍 Levenshtein距离,用于衡量字符串之间的差异,即将一个字符串转换为另一个字符串所需的转换次数.

它还内置于 PHP.

因此,我将按非单词字符拆分输入文件,并测量每个单词与目标建筑物列表之间的距离.如果距离低于某个阈值,则假设它是拼写错误.

我认为与尝试为每个特殊情况制作正则表达式相比,以这种方式匹配会更幸运.

I have a regex created from a list in a database to match names for types of buildings in a game. The problem is typos, sometimes those writing instructions for their team in the game will misspell a building name and obviously the regex will then not pick it up (i.e. spelling "University" and "Unversity").

Are there any suggestions on making a regex match misspellings of 1 or 2 letters?

The regex is dynamically generated and run on a local machine that's able to handle a lot more load so I have as a last resort to algorithmically create versions of each word with a letter missing and then another with letters added in.

I'm using PHP but I'd hope that any solution to this issue would not be PHP specific.

解决方案

Allow me to introduce you to the Levenshtein Distance, a measure of the difference between strings as the number of transformations needed to convert one string to the other.

It's also built into PHP.

So, I'd split the input file by non-word characters, and measure the distance between each word and your target list of buildings. If the distance is below some threshold, assume it was a misspelling.

I think you'd have more luck matching this way than trying to craft regex's for each special case.

这篇关于正则表达式拼写错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆