如何为 KNeighboursRegressor 使用自定义距离度量? [英] How could I use a custom distance metric for KNeighboursRegressor?

查看:71
本文介绍了如何为 KNeighboursRegressor 使用自定义距离度量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在使用 knn 回归模型时应用我自己的自定义距离度量函数.我的数据集是名义、有序、数字和二进制类型字段的混合

代码:

def cus_distance(array1, array2, **kwargs):# 计算距离,返回一个浮点数经过knn=neighbors.KNeighborsRegressor(weights='distance', metric=cus_distance)# train_data 是一个 Pandas 数据框 objknn.fit(train_data.ix[:, fields_list], train_data['time_costs'])

最后一行会引发异常:

--------------------------------------------------------------------------ValueError 回溯(最近一次调用最后一次)<ipython-input-284-04520b227b8a>在 <module>()---->1 knn.fit(train_data.ix[:, fields_list], train_data['time_costs'])/usr/local/lib/python2.7/dist-packages/sklearn/neighbors/base.pyc in fit(self, X, y)587 X, y = check_arrays(X, y, sparse_format="csr")第588话-->589 返回 self._fit(X)590591/usr/local/lib/python2.7/dist-packages/sklearn/neighbors/base.pyc in _fit(self, X)214 self._tree = BallTree(X,self.leaf_size,215 metric=self.effective_metric_,-->第216话217 elif self._fit_method =='kd_tree':218 self._tree = KDTree(X, self.leaf_size,/usr/local/lib/python2.7/dist-packages/sklearn/neighbors/ball_tree.so 在 sklearn.neighbors.ball_tree.BinaryTree.__init__ (sklearn/neighbors/ball_tree.c:7983)()/usr/local/lib/python2.7/dist-packages/numpy/core/numeric.pyc in asarray(a, dtype, order)318第319话-->320 返回数组(a,dtype,copy=False,order=order)321322 def asanyarray(a, dtype=None, order=None):ValueError:无法将字符串转换为浮点数:未知

我知道这个错误是由我的数据集中的字符串值('Unknown' 就是其中之一)引起的.
这让我很困惑,在我的理解中,函数 cus_distance 应该处理这些 str 值,而 KNeighborsRegressor 只是使用我函数的返回值.

问:
* 这是在 KNN 回归中使用自定义距离度量的正确方法吗?
* 如果是,为什么我会遇到此异常?
* 如果不是,正确的方法是什么?

解决方案

Ball Tree 和 KD Tree 需要浮点数据,无论使用何种度量.如果您的数据无法转换为浮点数,则会出现此类错误.

<预><代码>>>>将 numpy 导入为 np>>>数据 = [1, 未知", 2]>>>np.asarray(data, dtype=float)---------------------------------------------------------------------------ValueError 回溯(最近一次调用最后一次)---->1 np.asarray(data, dtype=float)ValueError:无法将字符串转换为浮点数:未知

I'm trying to apply my own custom distance metric function when using knn regression model. My dataset is a mixture of nominal, ordinal, numeric and binary types of fields

Code:

def cus_distance(array1, array2, **kwargs):
    # calculate the distance, return a float
    pass

knn = neighbors.KNeighborsRegressor(weights='distance', metric=cus_distance)

# train_data is a pandas dataframe obj
knn.fit(train_data.ix[:, fields_list], train_data['time_costs'])

The last line will cause an exception:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-284-04520b227b8a> in <module>()
----> 1 knn.fit(train_data.ix[:, fields_list], train_data['time_costs'])

/usr/local/lib/python2.7/dist-packages/sklearn/neighbors/base.pyc in fit(self, X, y)
    587             X, y = check_arrays(X, y, sparse_format="csr")
    588         self._y = y
--> 589         return self._fit(X)
    590 
    591 

/usr/local/lib/python2.7/dist-packages/sklearn/neighbors/base.pyc in _fit(self, X)
    214             self._tree = BallTree(X, self.leaf_size,
    215                                   metric=self.effective_metric_,
--> 216                                   **self.effective_metric_kwds_)
    217         elif self._fit_method == 'kd_tree':
    218             self._tree = KDTree(X, self.leaf_size,

/usr/local/lib/python2.7/dist-packages/sklearn/neighbors/ball_tree.so in sklearn.neighbors.ball_tree.BinaryTree.__init__ (sklearn/neighbors/ball_tree.c:7983)()

/usr/local/lib/python2.7/dist-packages/numpy/core/numeric.pyc in asarray(a, dtype, order)
    318 
    319     """
--> 320     return array(a, dtype, copy=False, order=order)
    321 
    322 def asanyarray(a, dtype=None, order=None):

ValueError: could not convert string to float: Unknown

I know this error caused by string values(the 'Unknown' is one of them) in my dataset.
This confused me, in my understanding, the function cus_distance should take care of these str values, and the KNeighborsRegressor just use the return value of my function.

Q:
* Is this the right way to use a custom defined distance metric in KNN Regression?
* If it is, why I met this exception?
* If not, what is the right way?

解决方案

The Ball Tree and KD Tree require floating point data, regardless of the metric used. If your data cannot be converted to floating point, then you will get this sort of error.

>>> import numpy as np
>>> data = [1, "Unknown", 2]
>>> np.asarray(data, dtype=float)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
----> 1 np.asarray(data, dtype=float)

ValueError: could not convert string to float: Unknown

这篇关于如何为 KNeighboursRegressor 使用自定义距离度量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆