什么是文本:: JaroWinkler :: strcmp95的第三个参数? [英] What is the third parameter to Text::JaroWinkler::strcmp95 for?
问题描述
我有兴趣用Perl编写的哈罗 - 温克勒模块来计算两个字符串之间的距离(或相似)的:
I am interested in the Jaro-Winkler module written in Perl to compute the distance (or similarity) between two strings:
http://search.cpan.org/~scw/文本JaroWinkler-0.1 / JaroWinkler.pm
函数的语法我不太清楚;我找不到它的任何明确的文档。
The syntax of the function is not clear to me; I could not find any clear documentation of it.
下面是示例code:
#!/usr/bin/perl
use 5.10.0;
use Text::JaroWinkler qw( strcmp95 );
print strcmp95("it is a dog","i am a dog.",11);
究竟做了11重present?我收集它的长度。它的长度?字符数量的长度我要检查的?在那里需要它?
What exactly does the 11 represent? I gather it is a length. Which length? The length of the amount of characters I want checked? Is it required to be there?
推荐答案
请参阅的一个回答你的问题的来源。它包含这一行:
See the source for an answer to your question. It contains this line:
$ying = sprintf("%*.*s", -$y_length, $y_length, $ying);
所以 $ y_length
是被用来格式化字符串,如果需要的话填充它们,并将它们修整到一个相同的长度。然后,这些相等长度的字符串被送入实际比较函数。这表明,亚历克斯是正确的,并给予最大(长$莹,长度$阳)的长度
是要给在大多数情况下最好的结果。
So $y_length
is being used to reformat the strings, padding them if necessary and trimming them to an identical length. These equal-length strings are then fed into the actual comparison function. This suggests that Alex is correct and giving a length of max(length $ying, length $yang)
is going to give the best results under most circumstances.
阅读该人士还透露,如果未能提供 $ y_length
,没有默认提供。所以,你会成为空字符串比较空字符串。这些应该有一个pretty短距离JW
Reading the source also reveals that if you fail to supply $y_length
, no default is supplied. So you'll be comparing the empty string to the empty string. Those should have a pretty short JW distance.
这篇关于什么是文本:: JaroWinkler :: strcmp95的第三个参数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!