带有张量流的雅卡德距离矩阵 [英] Jaccard's distance matrix with tensorflow
问题描述
我想使用 Jaccard 距离来计算距离矩阵.并尽快这样做.我曾经使用 scikit-learn的pairwise_distances 功能.但是scikit-learn并不打算支持GPU,甚至还有一个已知错误使并行运行时的功能变慢.
I would like to compute a distance matrix using the Jaccard distance. And do so as fast as possible. I used to use scikit-learn's pairwise_distances function. But scikit-learn doesn't plan to support GPU, and there's even a known bug that makes the function slower when run in parallel.
My only constraint is that the resulting distance matrix can then be fed to scikit-learn's DBSCAN clustering algorithm. I was thinking about implementing the computation with tensorflow but couldn't find a nice and simple way to do it.
PS:我有理由预先计算距离矩阵,而不是让DBSCAN根据需要进行计算.
PS: I have reasons to precompute the distance matrix instead of letting DBSCAN do it as needed.
推荐答案
嘿,我遇到了同样的问题.
Hej I was facing the same problem.
鉴于jaccard相似度是真实阳性(tp)与真实阳性,假阴性(fn)和假阳性(fp)之和的比率,我想出了以下解决方案:
Given the idea that the jaccard similarity is the ratio of true postives (tp) to the sum of true positives, false negatives (fn) and false positives (fp), I came up with this solution:
def jaccard_distance(self):
tp = tf.reduce_sum(tf.mul(self.target, self.prediction), 1)
fn = tf.reduce_sum(tf.mul(self.target, 1-self.prediction), 1)
fp = tf.reduce_sum(tf.mul(1-self.target, self.prediction), 1)
return 1 - (tp / (tp + fn + fp))
希望这会有所帮助!
这篇关于带有张量流的雅卡德距离矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!