二进制numpy数组之间的快速汉明距离计算 [英] Fast hamming distance computation between binary numpy arrays

查看:385
本文介绍了二进制numpy数组之间的快速汉明距离计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个长度相同的numpy数组,其中包含二进制值

I have two numpy arrays of the same length that contain binary values

import numpy as np
a=np.array([1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0])
b=np.array([1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1])

我想尽可能快地计算它们之间的汉明距离,因为我要进行数百万次这样的距离计算.

I want to compute the hamming distance between them as fast as possible since I have millions of such distance computations to make.

这是一个简单但缓慢的选项(摘自维基百科):

A simple but slow option is this (taken from wikipedia):

%timeit sum(ch1 != ch2 for ch1, ch2 in zip(a, b))
10000 loops, best of 3: 79 us per loop

我想出了更快的选择,灵感来自堆栈溢出的一些答案.

I have come up with faster options, inspired by some answers here on stack overflow.

%timeit np.sum(np.bitwise_xor(a,b))
100000 loops, best of 3: 6.94 us per loop

%timeit len(np.bitwise_xor(a,b).nonzero()[0])
100000 loops, best of 3: 2.43 us per loop

我想知道是否有更快的方法(可能使用cython)进行计算?

I'm wondering if there are even faster ways to compute this, possibly using cython?

推荐答案

有一个现成的numpy函数可以击败len((a != b).nonzero()[0]);)

There is a ready numpy function which beats len((a != b).nonzero()[0]) ;)

np.count_nonzero(a!=b)

这篇关于二进制numpy数组之间的快速汉明距离计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆