如何一致地缩放数据帧 MinMaxScaler() sklearn [英] How to scale dataframes consistently MinMaxScaler() sklearn

查看:42
本文介绍了如何一致地缩放数据帧 MinMaxScaler() sklearn的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有三个数据框,每个数据框都用 MinMaxScaler() 单独缩放.

I have three data frames that are each scaled individually with MinMaxScaler().

def scale_dataframe(values_to_be_scaled)
    values = values_to_be_scaled.astype('float64')
    scaler = MinMaxScaler(feature_range=(0, 1))
    scaled = scaler.fit_transform(values)

    return scaled

scaled_values = []
for i in range(0,num_df):
    scaled_values.append(scale_dataframe(df[i].values))

我遇到的问题是每个数据框都根据其自己的一组列最小值和最大值进行缩放.我需要我的所有数据帧缩放到相同的值,就好像它们都共享整个数据的相同列最小值和最大值集.有没有办法用 MinMaxScaler() 来完成这个?一种选择是制作一个大型数据帧,然后在分区之前缩放数据帧,但这并不理想.

The problem I am having is that each dataframe gets scaled according to its own individual set of column min and max values. I need all of my dataframes to scale to the same values as if they all shared the same set of column min and max values for the data overall. Is there a way to accomplish this with MinMaxScaler()? One option would be to make one large dataframe, then scale the dataframe before partitioning, but this would not be ideal.

推荐答案

查看优秀的 文档 sklearn.

Check out the excellent docs of sklearn.

如您所见,支持 partial_fit()!这允许在线缩放/小批量缩放,您可以控制小批量!

As you see, there is support for partial_fit()! This allows online-scaling/minibatch-scaling and you can control the minibatches!

示例:

import numpy as np
from sklearn.preprocessing import MinMaxScaler

a = np.array([[1,2,3]])
b = np.array([[10,20,30]])
c = np.array([[5, 10, 15]])

""" Scale on all datasets together in one batch """
offline_scaler = MinMaxScaler()
offline_scaler.fit(np.vstack((a, b, c)))                # fit on whole data at once
a_offline_scaled = offline_scaler.transform(a)
b_offline_scaled = offline_scaler.transform(b)
c_offline_scaled = offline_scaler.transform(c)
print('Offline scaled')
print(a_offline_scaled)
print(b_offline_scaled)
print(c_offline_scaled)

""" Scale on all datasets together in minibatches """
online_scaler = MinMaxScaler()
online_scaler.partial_fit(a)                            # partial fit 1
online_scaler.partial_fit(b)                            # partial fit 2
online_scaler.partial_fit(c)                            # partial fit 3
a_online_scaled = online_scaler.transform(a)
b_online_scaled = online_scaler.transform(b)
c_online_scaled = online_scaler.transform(c)
print('Online scaled')
print(a_online_scaled)
print(b_online_scaled)
print(c_online_scaled)

输出:

Offline scaled
[[ 0.  0.  0.]]
[[ 1.  1.  1.]]
[[ 0.44444444  0.44444444  0.44444444]]
Online scaled
[[ 0.  0.  0.]]
[[ 1.  1.  1.]]
[[ 0.44444444  0.44444444  0.44444444]]

这篇关于如何一致地缩放数据帧 MinMaxScaler() sklearn的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆