无法理解 sklearn 的 PolynomialFeatures [英] Cannot understand with sklearn's PolynomialFeatures

查看:23
本文介绍了无法理解 sklearn 的 PolynomialFeatures的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 sklearn 的多项式特征方面需要帮助.它适用于一个功能,但每当我添加多个功能时,它还会在数组中输出一些值,除了提高到度数的幂的值.例如:对于这个数组,

X=np.array([[230.1,37.8,69.2]])

当我尝试

X_poly=poly.fit_transform(X)

输出

[[ 1.00000000e+00 2.30100000e+02 3.78000000e+01 6.92000000e+015.29460100e+04 8.69778000e+03 1.59229200e+04 1.42884000e+032.61576000e+03 4.78864000e+03]]

这里,什么是8.69778000e+03,1.59229200e+04,2.61576000e+03?

解决方案

If you have features [a, b, c] 默认多项式特征(在sklearndegree is 2) 应该是[1, a, b, c, a^2, b^2, c^2, ab, bc, ca].

2.61576000e+0337.8x62.2=2615,76 (2615,76 = 2.61576000 x 10^3)

使用PolynomialFeatures 的简单方法,您可以创建新的特征.有一个很好的参考这里.当然,使用 PolynomialFeatures 也有缺点(过度拟合")(参见 此处).


我们在使用多项式特征时必须小心.多项式特征个数的计算公式为N(n,d)=C(n+d,d) 其中n为特征个数,d是多项式的次数,C是二项式系数(组合).在我们的例子中,数字是 C(3+2,2)=5!/(5-2)!2!=10 但是当特征的数量或度数是高度时,多项式特征变成太多.例如:

N(100,2)=5151N(100,5)=96560646

因此在这种情况下,您可能需要应用正则化来惩罚一些权重.该算法很可能会开始遭受维度诅咒(这里 也是一个很好的讨论).

Need help in sklearn's Polynomial Features. It works quite well with one feature but whenever I add multiple features, it also outputs some values in the array besides the values raised to the power of the degrees. For ex: For this array,

X=np.array([[230.1,37.8,69.2]])

when I try to

X_poly=poly.fit_transform(X)

It outputs

[[ 1.00000000e+00 2.30100000e+02 3.78000000e+01 6.92000000e+01
5.29460100e+04 8.69778000e+03 1.59229200e+04 1.42884000e+03
2.61576000e+03 4.78864000e+03]]

Here, what is 8.69778000e+03,1.59229200e+04,2.61576000e+03 ?

解决方案

If you have features [a, b, c] the default polynomial features(in sklearn the degree is 2) should be [1, a, b, c, a^2, b^2, c^2, ab, bc, ca].

2.61576000e+03 is 37.8x62.2=2615,76 (2615,76 = 2.61576000 x 10^3)

In a simple way with the PolynomialFeatures you can create new features. There is a good reference here. Of course there are and disadvantages("Overfitting") of using PolynomialFeatures(see here).

Edit:
We have to be careful when using the polynomial features. The formula for calculating the number of the polynomial features is N(n,d)=C(n+d,d) where n is the number of the features, d is the degree of the polynomial, C is binomial coefficient(combination). In our case the number is C(3+2,2)=5!/(5-2)!2!=10 but when the number of features or the degree is height the polynomial features becomes too many. For example:

N(100,2)=5151
N(100,5)=96560646

So in this case you may need to apply regularization to penalize some of the weights. It is quite possible that the algorithm will start to suffer from curse of dimensionality (here is also a very nice discussion).

这篇关于无法理解 sklearn 的 PolynomialFeatures的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆