用sklearn的多项式功能无法理解 [英] Cannot understand with sklearn's PolynomialFeatures

查看:73
本文介绍了用sklearn的多项式功能无法理解的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

需要sklearn的多项式特征的帮助.它可以很好地与一个功能配合使用,但是每当我添加多个功能时,它都会在数组中输出一些值,除了将这些值提高为度的幂.例如:对于此数组,

  X = np.array([[230.1,37.8,69.2]]) 

当我尝试

  X_poly = poly.fit_transform(X) 

它输出

  [[1.00000000e + 00 2.30100000e + 02 3.78000000e + 01 6.92000000e + 015.29460100e + 04 8.69778000e + 03 1.59229200e + 04 1.42884000e + 032.61576000e + 03 4.78864000e + 03]] 

这里,什么是 8.69778000e + 03,1.59229200e + 04,2.61576000e + 03 吗?

解决方案

如果您具有特征 [a,b,c] (默认的多项式特征(在 sklearn 中,度为2)应为 [1,a,b,c,a ^ 2,b ^ 2,c ^ 2,ab,bc,ca] .

2.61576000e + 03 37.8x62.2 = 2615,76 ( 2615,76 = 2.61576000 x 10 ^ 3 )

通过 PolynomialFeatures 的简单方法,您可以创建新功能.在此处,有很好的参考.当然,使用 PolynomialFeatures 有其弊端(过度拟合")(请参见此处).

修改:
使用多项式特征时,我们必须要小心.用于计算多项式特征数的公式为 N(n,d)= C(n + d,d),其中 n 是特征数, d 是多项式的次数, C 是二项式系数(组合).在我们的例子中,数字为 C(3 + 2,2)= 5!/(5-2)!2!= 10 ,但是当特征数量或次数为高度时,多项式特征变为太多.例如:

  N(100,2)= 5151N(100,5)= 96560646 

因此,在这种情况下,您可能需要应用正则化来惩罚某些权重.该算法很可能会开始遭受维度诅咒( 解决方案

If you have features [a, b, c] the default polynomial features(in sklearn the degree is 2) should be [1, a, b, c, a^2, b^2, c^2, ab, bc, ca].

2.61576000e+03 is 37.8x62.2=2615,76 (2615,76 = 2.61576000 x 10^3)

In a simple way with the PolynomialFeatures you can create new features. There is a good reference here. Of course there are and disadvantages("Overfitting") of using PolynomialFeatures(see here).

Edit:
We have to be careful when using the polynomial features. The formula for calculating the number of the polynomial features is N(n,d)=C(n+d,d) where n is the number of the features, d is the degree of the polynomial, C is binomial coefficient(combination). In our case the number is C(3+2,2)=5!/(5-2)!2!=10 but when the number of features or the degree is height the polynomial features becomes too many. For example:

N(100,2)=5151
N(100,5)=96560646

So in this case you may need to apply regularization to penalize some of the weights. It is quite possible that the algorithm will start to suffer from curse of dimensionality (here is also a very nice discussion).

这篇关于用sklearn的多项式功能无法理解的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆