加速文本比较（特征向量）与空间MySQL功能 [英] Speed up text comparisons (feature vectors) with spatial MySQL features

查看：520 发布时间：2016/12/21 23:27:40 mysql comparison spatial similarity

本文介绍了加速文本比较（特征向量）与空间MySQL功能的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个函数，它接受两个数组包含两个文本的标记/单词，并给出余弦相似性值，显示两个文本之间的关系。

函数接受数组$ tokensA（0 => house，1 => bike，2 => man）和数组$ tokensB（0 => bike，1 => house，2 => car）作为浮点值。

  function cosineSimilarity（$ tokensA，$ tokensB）{
 $ a = $ b = $ c = 0; 
 $ uniqueTokensA = $ uniqueTokensB = array（）; 
 $ uniqueMergedTokens = array_unique（array_merge（$ tokensA，$ tokensB））; 
 foreach（$ tokensA as $ token）$ uniqueTokensA [$ token] = 0; 
 foreach（$ tokensB as $ token）$ uniqueTokensB [$ token] = 0; 
 foreach（$ uniqueMergedTokens as $ token）{
 $ x = isset（$ uniqueTokensA [$ token]）？ 1：0; 
 $ y = isset（$ uniqueTokensB [$ token]）？ 1：0; 
 $ a + = $ x * $ y; 
 $ b + = $ x; 
 $ c + = $ y; 
} 
 return $ b * $ c！= 0？ $ a / sqrt（$ b * $ c）：0; 
}

如果我想比较75个文本，我需要做5,625

是否可以使用MySQL的空间列来减少比较次数？

我不想谈论我的功能或比较文本的方式。

MySQL的空间列

您可以使用以下方式创建空间列：CREATE TABLE abc（clmnName TYPE）

列出可能的类型此处

这里是我如何选择数据[例如MultiPointFromText（）或AsText（）]

您插入如下值：INSERT INTO clmnName VALUES（GeomFromText（'POINT（11）'））

但是如何使用这个来解决我的问题？

PS：我在寻找方法来减少与算法的比较次数此问题。 Vinkal Vrsalovic告诉我，我应该为空间特征打开另一个问题。

解决方案通常可以索引具有任意维数的数据， MySQL 空间能力仅限于

几何 2  
 c $ c> -dimensional 和，您可以对其进行规范化，然后执行以下操作：
 
 
  
 符合你的差异的角度数量的两倍
 
 找到与每个扇区中心具有给定余弦差的向量的 MBR  li> 
 
查找 MBR  
中的所有向量
 对精确的差异进行精细过滤。
 
 
 
 
 然而，在这种情况下，最好只是预先计算值的角度，并用平滑的 B-Tree 索引。
 
I have a function which takes two arrays containing the tokens/words of two texts and gives out the cosine similarity value which shows the relationship between both texts.

The function takes an array $tokensA (0=>house, 1=>bike, 2=>man) and an array $tokensB (0=>bike, 1=>house, 2=>car) and calculates the similarity which is given back as a floating point value.
function cosineSimilarity($tokensA, $tokensB) {
    $a = $b = $c = 0;
    $uniqueTokensA = $uniqueTokensB = array();
    $uniqueMergedTokens = array_unique(array_merge($tokensA, $tokensB));
    foreach ($tokensA as $token) $uniqueTokensA[$token] = 0;
    foreach ($tokensB as $token) $uniqueTokensB[$token] = 0;
    foreach ($uniqueMergedTokens as $token) {
        $x = isset($uniqueTokensA[$token]) ? 1 : 0;
        $y = isset($uniqueTokensB[$token]) ? 1 : 0;
        $a += $x * $y;
        $b += $x;
        $c += $y;
    }
    return $b * $c != 0 ? $a / sqrt($b * $c) : 0;
}
If I want to compare 75 texts with each other, I need to make 5,625 single comparisons to have all texts compared with each other.

Is it possible to use MySQL's spatial columns to reduce the number of comparisons?

I don't want to talk about my function or about ways to compare texts. Just about reducing the number of comparisons.

MySQL's spatial columns


You create spatial columns with: CREATE TABLE abc (clmnName TYPE)
possible types are listed here
here is how I select the data later [e.g. MultiPointFromText() or AsText()]
You insert values like this: INSERT INTO clmnName VALUES (GeomFromText('POINT(1 1)'))


But how do you use this for my problem?

PS: I'm looking for ways to reduce the number of comparisons with algorithms in this question. Vinko Vrsalovic told me that I should open another question for the spatial features.
 解决方案 
While R-Trees in general can index data with arbitrary number of dimensions, MySQL spatial abilities are only limited to Geometry types (2 dimensions).

If your vectors are 2-dimensional and you can normalize them, then do the following:


Split the circle into twice the number of angles which fit your differences
Find the MBR of vectors with given cosine difference from the center of each sector
Find all vectors within the MBR
Do the fine filtering for exact difference.


In this case, however, it will be better just to precaculate the angle of the value and index it with a plain B-Tree index.

                        这篇关于加速文本比较（特征向量）与空间MySQL功能的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

加速文本比较（特征向量）与空间MySQL功能 [英] Speed up text comparisons (feature vectors) with spatial MySQL features

问题描述

相关文章

数据库最新文章

热门教程

热门工具

登录关闭

加速文本比较（特征向量）与空间MySQL功能 [英] Speed up text comparisons (feature vectors) with spatial MySQL features

问题描述

相关文章

数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭