如何使用 scikit 学习具有新值的 inverse_transform [英] How to use scikit learn inverse_transform with new values

查看:90
本文介绍了如何使用 scikit 学习具有新值的 inverse_transform的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一组数据,我用scikit学习PCA.在使用 StandardScaler() 执行 PCA 之前,我对数据进行了缩放.

I have a set of data that I have used scikit learn PCA. I scaled the data before performing PCA with StandardScaler().

variance_to_retain = 0.99
np_scaled = StandardScaler().fit_transform(df_data)
pca = PCA(n_components=variance_to_retain)
np_pca = pca.fit_transform(np_scaled)

# make dataframe of scaled data
# put column names on scaled data for use later
df_scaled = pd.DataFrame(np_scaled, columns=df_data.columns)
num_components = len(pca.explained_variance_ratio_)
cum_variance_explained = np.cumsum(pca.explained_variance_ratio_)

eigenvalues = pca.explained_variance_
eigenvectors = pca.components_

然后我在缩放的数据集上运行 K-Means 聚类.我可以在缩放空间中很好地绘制聚类中心.

I then ran K-Means clustering on the scaled dataset. I can plot the cluster centers just fine in scaled space.

我的问题是:如何将中心的位置转换回原始数据空间.我知道 StandardScaler.fit_transform() 使数据的均值和单位方差为零.但是有了新的形状点(num_clusters,num_features),我可以使用inverse_transform(centers)把中心转换回原始数据的范围和偏移量吗?

My question is: how do I transform the locations of the centers back into the original data space. I know that StandardScaler.fit_transform() make the data have zero mean and unit variance. But with the new points of shape (num_clusters, num_features), can I use inverse_transform(centers) to get the centers transformed back into the range and offset of the original data?

谢谢,大卫

推荐答案

您可以在 kmeans 上获取 cluster_centers,然后将其推送到您的 pca.inverse_transform

you can get cluster_centers on a kmeans, and just push that into your pca.inverse_transform

这是一个例子

import numpy as np
from sklearn import decomposition
from sklearn import datasets
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler



iris = datasets.load_iris()
X = iris.data
y = iris.target

scal = StandardScaler()
X_t = scal.fit_transform(X)

pca = decomposition.PCA(n_components=3)
pca.fit(X_t)
X_t = pca.transform(X_t)

clf = KMeans(n_clusters=3)
clf.fit(X_t)

scal.inverse_transform(pca.inverse_transform(clf.cluster_centers_))

请注意,sklearn 有多种方法可以进行拟合/转换.你可以做 StandardScaler().fit_transform(X) 但你失去了缩放器,并且不能重用它;你也不能用它来创建一个逆.

Note that sklearn has multiple ways to do the fit/transform. You can do StandardScaler().fit_transform(X) but you lose the scaler, and can't reuse it; nor can you use it to create an inverse.

或者,您可以执行 scal = StandardScaler() 后跟 scal.fit(X)scal.transform(X)>

Alternatively, you can do scal = StandardScaler() followed by scal.fit(X) and then by scal.transform(X)

或者你可以做 scal.fit_transform(X) 它结合了拟合/转换步骤

OR you can do scal.fit_transform(X) which combines the fit/transform step

这篇关于如何使用 scikit 学习具有新值的 inverse_transform的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆