PHP - 智能,容错字符串比较 [英] PHP - smart, error tolerating string comparison

查看:185
本文介绍了PHP - 智能,容错字符串比较的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



假设我们有测试字符串Čakánka

/ code> - 是的,它包含CE字符。



现在,我想接受以下任何字符串 OK




  • cakanka

  • cákanká

  • ČaKaNKA

  • CAKANKA

  • CAAKNKA

  • CKAANKA

  • cakakNa



问题是,我经常在字中切换字母,我想尽量减少用户对无法处理的挫折



所以,我知道如何做ci比较(只是使它小写:]),我可以删除CE字符,我



此外,你经常把一个字符不仅放在错误的地方( character code> => cahracter ),但有时将其移动多个地方( character => <$ c



谢谢:]

div class =h2_lin>解决方案

不确定(特别是关于重音/特殊字符的东西,你可能必须先处理)在错误的地方或缺失的地方, levenshtein 函数,计算 Levenshtein distance < :

  int levenshtein (string $ str1,string $ str2)
int levenshtein(string $ str1,string $ str2,int $ cost_ins,int $ cost_rep,int $ cost_del)




Levenshtein距离定义为

必须替换,插入或删除的最小字符数
将str1转换为str2




其他可能有用的函数可以是
soundex similar_text metaphone



这些函数的手册页上的一些用户注释,特别是 levenshtein 的手册页可能会为您带来一些有用的东西; - )


I'm looking either for routine or way to look for error tolerating string comparison.

Let's say, we have test string Čakánka - yes, it contains CE characters.

Now, I want to accept any of following strings as OK:

  • cakanka
  • cákanká
  • ČaKaNKA
  • CAKANKA
  • CAAKNKA
  • CKAANKA
  • cakakNa

The problem is, that I often switch letters in word, and I want to minimize user's frustration with not being able (i.e. you're in rush) to write one word right.

So, I know how to make ci comparison (just make it lowercase :]), I can delete CE characters, I just can't wrap my head around tolerating few switched characters.

Also, you often put one character not only in wrong place (character=>cahracter), but sometimes shift it by multiple places (character=>carahcter), just because one finger was lazy during writing.

Thank you :]

解决方案

Not sure (especially about the accents / special characters stuff, which you might have to deal with first), but for characters that are in the wrong place or missing, the levenshtein function, that calculates Levenshtein distance between two strings, might help you (quoting) :

int levenshtein  ( string $str1  , string $str2  )
int levenshtein  ( string $str1  , string $str2  , int $cost_ins  , int $cost_rep  , int $cost_del  )

The Levenshtein distance is defined as the minimal number of characters you have to replace, insert or delete to transform str1 into str2


Other possibly useful functions could be soundex, similar_text, or metaphone.

And some of the user notes on the manual pages of those functions, especially the manual page of levenshtein might bring you some useful stuff too ;-)

这篇关于PHP - 智能,容错字符串比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆