在scikit-learn中拟合数据与转换数据 [英] Fitting data vs. transforming data in scikit-learn
问题描述
在 scikit学习中,所有估算器都具有 fit()
方法,并且根据它们是受监督还是不受监督,它们还具有 predict()
或 transform()
方法.
In scikit-learn, all estimators have a fit()
method, and depending on whether they are supervised or unsupervised, they also have a predict()
or transform()
method.
我正在编写一个 transformer transformer 进行无人监督的学习任务,并且想知道是否存在将哪种学习逻辑放在何处的经验法则.官方文档在这方面不是很有帮助:
I am in the process of writing a transformer for an unsupervised learning task and was wondering if there is a rule of thumb where to put which kind of learning logic. The official documentation is not very helpful in this regard:
fit_transform(X,y = None,** fit_params)
适合数据,然后进行转换.
fit_transform(X, y=None, **fit_params)
Fit to data, then transform it.
在这种情况下,拟合数据和转换数据是什么意思?
In this context, what is meant by both fitting data and transforming data?
推荐答案
拟合查找将用于转换数据的模型的内部参数.转换将参数应用于数据.您可以将模型拟合到一组数据,然后将其转换到完全不同的一组数据上.
Fitting finds the internal parameters of a model that will be used to transform data. Transforming applies the parameters to data. You may fit a model to one set of data, and then transform it on a completely different set.
例如,您将线性模型拟合到数据以获取斜率和截距.然后使用这些参数将 x
的新值或现有值转换(即映射)为 y
.
For example, you fit a linear model to data to get a slope and intercept. Then you use those parameters to transform (i.e., map) new or existing values of x
to y
.
fit_transform
只是对同一数据执行两个步骤.
fit_transform
is just doing both steps to the same data.
一个scikit示例:您可以对数据进行拟合以找到主要成分.然后转换数据以查看它如何映射到这些组件上:
A scikit example: You fit data to find the principal components. Then you transform your data to see how it maps onto these components:
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
X = [[1,2],[2,4],[1,3]]
pca.fit(X)
# This is the model to map data
pca.components_
array([[ 0.47185791, 0.88167459],
[-0.88167459, 0.47185791]], dtype=float32)
# Now we actually map the data
pca.transform(X)
array([[-1.03896057, -0.17796634],
[ 1.19624651, -0.11592512],
[-0.15728599, 0.29389156]])
# Or we can do both "at once"
pca.fit_transform(X)
array([[-1.03896058, -0.1779664 ],
[ 1.19624662, -0.11592512],
[-0.15728603, 0.29389152]], dtype=float32)
这篇关于在scikit-learn中拟合数据与转换数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!