无法理解 sklearn 的 PolynomialFeatures [英] Cannot understand with sklearn's PolynomialFeatures
问题描述
在 sklearn 的多项式特征方面需要帮助.它适用于一个功能,但每当我添加多个功能时,它还会在数组中输出一些值,除了提高到度数的幂的值.例如:对于这个数组,
X=np.array([[230.1,37.8,69.2]])
当我尝试
X_poly=poly.fit_transform(X)
输出
[[ 1.00000000e+00 2.30100000e+02 3.78000000e+01 6.92000000e+015.29460100e+04 8.69778000e+03 1.59229200e+04 1.42884000e+032.61576000e+03 4.78864000e+03]]
这里,什么是8.69778000e+03,1.59229200e+04,2.61576000e+03
?
If you have features [a, b, c]
默认多项式特征(在sklearn
degree is 2) 应该是[1, a, b, c, a^2, b^2, c^2, ab, bc, ca]
.
2.61576000e+03
是 37.8x62.2=2615,76
(2615,76 = 2.61576000 x 10^3
)>
使用PolynomialFeatures
的简单方法,您可以创建新的特征.有一个很好的参考这里.当然,使用 PolynomialFeatures
也有缺点(过度拟合")(参见 此处).
我们在使用多项式特征时必须小心.多项式特征个数的计算公式为N(n,d)=C(n+d,d)
其中n
为特征个数,d
是多项式的次数,C
是二项式系数(组合).在我们的例子中,数字是 C(3+2,2)=5!/(5-2)!2!=10
但是当特征的数量或度数是高度时,多项式特征变成太多.例如:
N(100,2)=5151N(100,5)=96560646
因此在这种情况下,您可能需要应用正则化来惩罚一些权重.该算法很可能会开始遭受维度诅咒(这里 也是一个很好的讨论).
Need help in sklearn's Polynomial Features. It works quite well with one feature but whenever I add multiple features, it also outputs some values in the array besides the values raised to the power of the degrees. For ex: For this array,
X=np.array([[230.1,37.8,69.2]])
when I try to
X_poly=poly.fit_transform(X)
It outputs
[[ 1.00000000e+00 2.30100000e+02 3.78000000e+01 6.92000000e+01
5.29460100e+04 8.69778000e+03 1.59229200e+04 1.42884000e+03
2.61576000e+03 4.78864000e+03]]
Here, what is 8.69778000e+03,1.59229200e+04,2.61576000e+03
?
If you have features [a, b, c]
the default polynomial features(in sklearn
the degree is 2) should be [1, a, b, c, a^2, b^2, c^2, ab, bc, ca]
.
2.61576000e+03
is 37.8x62.2=2615,76
(2615,76 = 2.61576000 x 10^3
)
In a simple way with the PolynomialFeatures
you can create new features. There is a good reference here. Of course there are and disadvantages("Overfitting") of using PolynomialFeatures
(see here).
Edit:
We have to be careful when using the polynomial features. The formula for calculating the number of the polynomial features is N(n,d)=C(n+d,d)
where n
is the number of the features, d
is the degree of the polynomial, C
is binomial coefficient(combination). In our case the number is C(3+2,2)=5!/(5-2)!2!=10
but when the number of features or the degree is height the polynomial features becomes too many. For example:
N(100,2)=5151
N(100,5)=96560646
So in this case you may need to apply regularization to penalize some of the weights. It is quite possible that the algorithm will start to suffer from curse of dimensionality (here is also a very nice discussion).
这篇关于无法理解 sklearn 的 PolynomialFeatures的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!