sklearn KNN 中的加权距离 [英] Weighted distance in sklearn KNN

查看:41
本文介绍了sklearn KNN 中的加权距离的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在制作一个遗传算法来寻找权重,以便将它们应用于 sklearn KNN 中的欧几里德距离,试图提高分类率并删除数据集中的一些特征(我通过将权重更改为 0 来实现这一点)).我正在使用 Python 和 sklearn 的 KNN.这是我使用它的方式:

I'm making a genetic algorithm to find weights in order to apply them to the euclidean distance in the sklearn KNN, trying to improve the classification rate and removing some characteristics in the dataset (I made this with changing the weight to 0). I'm using Python and the sklearn's KNN. This is how I'm using it:

def w_dist(x, y, **kwargs):
   return sum(kwargs["weights"]*((x-y)*(x-y)))

KNN = KNeighborsClassifier(n_neighbors=1,metric=w_dist,metric_params={"weights": w})
KNN.fit(X_train,Y_train)
neighbors=KNN.kneighbors(n_neighbors=1,return_distance=False)
Y_n=Y_train[neighbors]
tot=0
for (a,b)in zip(Y_train,Y_vecinos):
    if a==b:
        tot+=1

reduc_rate=X_train.shape[1]-np.count_nonzero(w)/tamaño
class_rate=tot/X_train.shape[0]

它工作得很好,但速度很慢.我一直在分析我的代码,最慢的部分是距离的评估.

It's working really well, but it's very slow. I have been profiling my code and the slowest part is the evaluation of the distance.

我想问一下是否有一些不同的方法可以告诉 KNN 在距离中使用权重(我必须使用欧几里德距离,但我删除了平方根).

I want to ask if there is some different way to tell KNN to use weights in the distance (I must use the euclidean distance, but I remove the square root).

谢谢!

推荐答案

确实有另一种方法,它内置在 scikit-learn 中(所以应该更快).您可以使用带有权重的 wminkowski 指标.以下是训练集中特征的随机权重示例.

There is indeed another way, and it's inbuilt into scikit-learn (so should be quicker). You can use the wminkowski metric with weights. Below is an example with random weights for the features in your training set.

knn = KNeighborsClassifier(metric='wminkowski', p=2, 
                           metric_params={'w': np.random.random(X_train.shape[1])})

这篇关于sklearn KNN 中的加权距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆