如何干净地规范化数据然后“非规范化"?以后呢? [英] How can I cleanly normalize data and then "unnormalize" it later?
问题描述
我将 Anaconda 与 Tensorflow 神经网络一起使用.我的大部分数据都存储在 pandas
中.
我正在尝试预测加密货币市场.我知道很多人可能都在这样做,而且很可能不会非常有效,我主要是为了熟悉 Tensorflow 和 Anaconda 工具.
我对此很陌生,所以如果我做错了什么或不是最理想的,请告诉我.
I am using Anaconda with a Tensorflow neural network. Most of my data is stored with pandas
.
I am attempting to predict cryptocurrency markets. I am aware that this lots of people are probably doing this and it is most likely not going to be very effective, I'm mostly doing it to familiarize myself with Tensorflow and Anaconda tools.
I am fairly new to this, so if I am doing something wrong or suboptimally please let me know.
以下是我获取和处理数据的方式:
Here is how I aquire and handle the data:
- 从 quandl.com 下载数据集到 Pandas
DataFrames
- 从每个下载的数据集中选择所需的列
- 连接
DataFrames
- 从新合并的
DataFrame
中删除所有 NaN - 使用代码
df = (df - df.min())/(df.max() - df.min())
- 将标准化数据输入我的神经网络
- 非规范化数据(这是我没有实现的部分)
现在,我的问题是,我怎样才能彻底规范化这些数据,然后不规范化这些数据?我意识到如果我想对数据进行非规范化,我将需要存储初始 df.min()
和 df.max()
值,但这看起来丑陋,感觉很麻烦.
我知道我可以使用 sklearn.preprocessing.MinMaxScaler
对数据进行标准化,但据我所知,我无法使用它对数据进行非标准化.
Now, my question is, how can I cleanly normalize and then unnormalize this data? I realize that if I want to unnormalize data, I'm going to need to store the initial df.min()
and df.max()
values, but this looks ugly and feels cumbersome.
I am aware that I can normalize data with sklearn.preprocessing.MinMaxScaler
, but as far as I know I can't unnormalize data using this.
可能是我在这里做了一些根本性的错误,但如果没有一种干净的方法来使用 Anaconda 或其他库对数据进行规范化和非规范化,我会感到非常惊讶.
It might be that I'm doing something fundamentally wrong here, but I'll be very surprised if there isn't a clean way to normalize and unnormalize data with Anaconda or other libraries.
推荐答案
sklearn.preprocessing
有专门为此设计的 inverse_transform
方法.
例如,要使用 MinMaxScaler
缩放和取消缩放您的 DataFrame
,您可以这样做:
For example, to scale and un-scale your DataFrame
with MinMaxScaler
you could do:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaled = scaler.fit_transform(df)
unscaled = scaler.inverse_transform(scaled)
请记住 transform
函数(以及 fit_transform
)返回一个 numpy.array
,而不是 pandas.Dataframe
.
Just bear in mind that the transform
function (and fit_transform
as well) return a numpy.array
, and not a pandas.Dataframe
.
这篇关于如何干净地规范化数据然后“非规范化"?以后呢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!