使用SOLR计算＆QUOT;相似性QUOT; /＆QUOT;位计数＆QUOT; 2 ulongs之间 [英] Using SOLR to calculate "similarity"/"bitcount" between two ulongs

查看：365 发布时间：2016/8/7 19:45:51 c# solr bit-manipulation solrnet phash

本文介绍了使用SOLR计算＆QUOT;相似性QUOT; /＆QUOT;位计数＆QUOT; 2 ulongs之间的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我们已经在那里我已经使用的博士。通过实施尼尔Krawetz的方法大卫Oftedal 。

We have a database of images where I have calculated the PHASH using Dr. Neal Krawetz's method as implemented by David Oftedal.

样品code部分计算这些多头之间的差异是在这里：

Part of the sample code calculates the difference between these longs is here:

ulong hash1 = AverageHash(theImage);
ulong hash2 = AverageHash(theOtherImage);

uint BitCount(ulong theNumber)
{
    uint count = 0;
    for (; theNumber > 0; theNumber >>= 8) {
        count += bitCounts[(theNumber & 0xFF)];
    }
    return count;
}

Console.WriteLine("Similarity: " + ((64 - BitCount(hash1 ^ hash2)) * 100.0) / 64.0 + "%");

目前的挑战是，我只知道这些散列之一，我想查询SOLR找到其他哈希相似度的顺序。

The challenge is that I only know one of these hashes and I want to query SOLR to find other hashes in order of similarity.

的几个注意事项：

使用SOLR这里（仅限于我的选择是HBASE）

要避免安装任何自定义的Java到Solr的（高兴安装现有插件）

快乐做大量的pre-处理在C＃

乐于使用多个字段，以数据存储为一个位串，长等

使用SOLRNet作为客户端

编辑，一些额外的信息（抱歉，我陷入了这个问题，并开始假设它是一个众所周知的区域）。这里是一个直接下载到C＃控制台/示例应用程序： http://01101001.net/Imghash.zip

Edit, some extra information (apologies I am caught up in the problem and started assuming it was a widely known area). Here is a direct download to the C# console / sample app: http://01101001.net/Imghash.zip

该控制台应用程序的一个例子输出是：

An example output of this console app would be:

004143737f7f7f7f phash试验001.JPG结果
0041417f7f7f7f7f phash试验002.JPG结果
相似度：95.3125％结果

004143737f7f7f7f phash-test-001.jpg
0041417f7f7f7f7f phash-test-002.jpg
Similarity: 95.3125%

推荐答案

您可以使用的 Solr的模糊搜索，你需要向下滚动页面上的一点。

You can use Solr's Fuzzy Search for this, you have to scroll down a bit on the page.

Solr的标准查询分析器支持基于Levenshtein距离或者编辑距离算法模糊搜索。模糊搜索发现，类似的规定期限，而不一定是完全匹配的条款。要在单个词学期末进行模糊搜索，使用波浪号〜符号。

Solr's standard query parser supports fuzzy searches based on the Levenshtein Distance or Edit Distance algorithm. Fuzzy searches discover terms that are similar to a specified term without necessarily being an exact match. To perform a fuzzy search, use the tilde ~ symbol at the end of a single-word term.

假设你已经像下面，在这个领域 phash 包含您所计算的phash的模式。

Assuming you have a schema like below, where this field phash holds the phash you have calculated.

<fields>
    <!-- ... all your other fields ... -->
    <field name="phash" type="string" indexed="true" stored="true" />
</fields>

您可以执行类似的查询

q=phash:004143737f7f7f7f~0.8&
fl=score,phash

这将返回有一个 Levenshtein距离或者编辑距离的至少80％PHASH所有文档。你不会得到你已经在你的问题中给出的95.3125％，但87.5％的匹配/不匹配字符计数。

This will return all documents that have a PHASH with a Levenshtein Distance or Edit Distance of at least 80%. You will not get the 95.3125% you have given in your question, but a 87,5% as matching/not matching characters are counted.

当你想看到该值，你可以执行以下查询

When you want to see that value, you may perform the following query

q=phash:004143737f7f7f7f~0.8&
fl=score,phash,strdist("0041417f7f7f7f7f", phash, edit)

这是一个<一个href=\"https://cwiki.apache.org/confluence/display/solr/Function+Queries#FunctionQueries-AvailableFunctions\"相对=nofollow>函数调用使用莱文施泰因或编辑距离，将提供的结果类似。

This is a function call to fetch the String Distance using the Levenstein or Edit distance and will deliver a result similar to

+----------------+---------------------------------------+
|hash            |strdist("0041417f7f7f7f7f", hash, edit)|
+----------------+---------------------------------------+
|0041417f7f7f7f7f|1.0                                    |
+----------------+---------------------------------------+
|004143737f7f7f7f|0.875                                  |
+----------------+---------------------------------------+

当你想减少 95.3125％和 87.5％你应该考虑到存储之间的差距PHASH不是十六进制值，而是作为八进制的实例。

When you want to reduce the gap between 95.3125% and 87,5% you should consider to store the PHASH not as hexadecimal value, but as octal for instance.

这篇关于使用SOLR计算＆QUOT;相似性QUOT; /＆QUOT;位计数＆QUOT; 2 ulongs之间的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用SOLR计算＆QUOT;相似性QUOT; /＆QUOT;位计数＆QUOT; 2 ulongs之间 [英] Using SOLR to calculate "similarity"/"bitcount" between two ulongs

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

使用SOLR计算＆QUOT;相似性QUOT; /＆QUOT;位计数＆QUOT; 2 ulongs之间 [英] Using SOLR to calculate &quot;similarity&quot;/&quot;bitcount&quot; between two ulongs

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

使用SOLR计算＆QUOT;相似性QUOT; /＆QUOT;位计数＆QUOT; 2 ulongs之间 [英] Using SOLR to calculate "similarity"/"bitcount" between two ulongs

登录关闭