Metaphone算法出乎意料的结果 [英] Unexpected results from Metaphone algorithm

查看:190
本文介绍了Metaphone算法出乎意料的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在对Java中的不同单词使用语音匹配.我用了Soundex,但它太粗糙了.我改用Metaphone并意识到效果更好.但是,当我进行严格测试时.我发现了奇怪的行为.我要问的是,这是变音器的工作方式还是我以错误的方式使用它.在下面的示例中,它可以正常工作:-

I am using phonetic matching for different words in Java. i used Soundex but its too crude. i switched to Metaphone and realized it was better. However, when i rigorously tested it. i found weird behaviour. i was to ask whether thats the way metaphone works or am i using it in wrong way. In following example its works fine:-

Metaphone meta = new Metaphone();
if (meta.isMetaphoneEqual("cricket","criket")) System.out.prinlnt("Match 1");
if (meta.isMetaphoneEqual("cricket","criketgame")) System.out.prinlnt("Match 2");

这将打印

  Match 1
  Mathc 2

现在板球"听起来确实像板球",但是板球"和板球游戏"的含义却是一样的.如果有人会解释这一点.这将有很大的帮助.

Now "cricket" does sound like "criket" but how come "cricket" and "criketgame" are the same. If some one would explain this. it would be of great help.

推荐答案

您的用法略有错误.快速检查编码的字符串和默认的最大代码长度,发现它是4,这会截断较长的"criketgame"的结尾:

Your usage is slightly incorrect. A quick investigation of the encoded strings and default maximum code length shows that it is 4, which truncates the end of the longer "criketgame":

System.out.println(meta.getMaxCodeLen());
System.out.println(meta.encode("cricket"));
System.out.println(meta.encode("criket"));
System.out.println(meta.encode("criketgame"));

输出(注释"criketgame"从"KRKTKM"截断为"KRKT",与"cricket"匹配):

Output (note "criketgame" is truncated from "KRKTKM" to "KRKT", which matches "cricket"):


4
KRKT
KRKT
KRKT


解决方案:将最大代码长度设置为适合您的应用程序和预期输入的值.例如:

meta.setMaxCodeLen(8);
System.out.println(meta.encode("cricket"));
System.out.println(meta.encode("criket"));
System.out.println(meta.encode("criketgame"));

现在输出:


KRKT
KRKT
KRKTKM

现在您的原始测试给出了预期的结果:

And now your original test gives the expected results:

Metaphone meta = new Metaphone();
meta.setMaxCodeLen(8);
System.out.println(meta.isMetaphoneEqual("cricket","criket"));
System.out.println(meta.isMetaphoneEqual("cricket","criketgame"));

打印:


true
false

顺便说一句,您可能还想尝试

As an aside, you may also want to experiment with DoubleMetaphone, which is an improved version of the algorithm.

顺便说一句,请注意有关线程安全的文档:

实例字段maxCodeLen是可变的,但不是可变的,并且访问不同步.如果在线程之间共享该类的实例,则调用者需要确保使用适当的同步来确保线程之间的值的安全发布,并且在初始设置后一定不能调用setMaxCodeLen(int).

The instance field maxCodeLen is mutable but is not volatile, and accesses are not synchronized. If an instance of the class is shared between threads, the caller needs to ensure that suitable synchronization is used to ensure safe publication of the value between threads, and must not invoke setMaxCodeLen(int) after initial setup.

这篇关于Metaphone算法出乎意料的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆