使用Jaro-Winkler,A和B之间的距离是否等于B和A? [英] Using Jaro-Winkler, is distance between A and B the same as B and A?

查看:131
本文介绍了使用Jaro-Winkler,A和B之间的距离是否等于B和A?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用以下来计算之间的 Jaro-Winkler距离两个字符串.我注意到的是,字符串A和B之间计算出的距离并不总是与字符串B和A相同.这是可以预期的吗?

I'm using the following class to calculate the Jaro-Winkler distance between two strings. What I'm noticing is that the distance calculated between string A and B is not always the same as string B and A. Is this to be expected?

RAMADI ~ TRADING
0.73492063492063

TRADING ~ RAMADI
0.71825396825397

演示

推荐答案

结果发现,PHP版本的Jaro-Winkler字符串比较方法中存在一个错误,在线上有很多地方.

Turns out, there is a bug in the PHP versions of the Jaro-Winkler string comparison method found many places online.

当前,与字符串B相比,字符串A与字符串B的结果与字符串B的结果不同.字符串.这是不正确的. 在比较A与B的匹配值与B与A的匹配值时,Jaro-Winkler方法应产生相同的结果.

Currently, string A compared to string B will yield a different result to string B compared to string A, when either string A or B contains a character found in both strings, that is found more than once in one of the string. This is incorrect. The Jaro-Winkler method should yield the same result when comparing the match value from A compared to B with B compared to A.

为此,在识别公共字符时,不应重复相同的字符.常见字符变量需要删除重复数据后才能返回.

To rectify this, when identifying the common characters, the same character should not be repeated. The common characters variable needs to be deduplicated before returned.

下面的代码将公共字符串替换为使用公共字符作为键的数组,以避免重复.通过使用下面的代码,与B相比,A与B的结果相同.

The below code replaces the common characters string with an array that uses the common character as the key, to avoid duplication. By using the code below, A compared to B yields the same results as B compared to A.

这与该方法的C#版本一致.

This is inline with the C# version of the method.

//$commonCharacters='';
# The Common Characters variable must be an array
$commonCharacters = [];
for( $i=0; $i < $str1_len; $i++){
    $noMatch = True;
    // compare if char does match inside given allowedDistance
    // and if it does add it to commonCharacters
    for( $j= max( 0, $i-$allowedDistance ); $noMatch && $j < min( $i + $allowedDistance + 1, $str2_len ); $j++) {
        if( $temp_string2[(int)$j] == $string1[$i] ){ // MJR
            $noMatch = False;
            //$commonCharacters .= $string1[$i];
            # The Common Characters array uses the character as a key to avoid duplication.
            $commonCharacters[$string1[$i]] = $string1[$i];
            $temp_string2[(int)$j] = ''; // MJR
        }
    }
}
//return $commonCharacters;
# When returning, turn the array back to a string, as expected
return implode("", $commonCharacters);

这篇关于使用Jaro-Winkler,A和B之间的距离是否等于B和A?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆