ValueError:找到具有 0 个样本 (s) 的数组(形状 = (0, 1),而 MinMaxScaler 需要最小值为 1 [英] ValueError: Found array with 0 sample (s) (shape= (0, 1) while a minimum of 1 is required by MinMaxScaler

查看:364
本文介绍了ValueError:找到具有 0 个样本 (s) 的数组(形状 = (0, 1),而 MinMaxScaler 需要最小值为 1的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个学期我开始研究机器学习.我们只使用了诸如 Microsoft 的 Azure 和亚马逊的 AWS 之类的 API,但我们还没有深入了解这些服务的工作原理.我的好朋友是数学专业的大四学生,他让我帮助他根据他提供给我的 .csv 文件使用 TensorFlow 创建一个股票预测器.

我遇到了一些问题.第一个是他的 .csv 文件.该文件只有日期和结束值,它们没有分开,因此我不得不手动分开日期和值.我已经设法做到了,现在我在 MinMaxScaler() 上遇到了麻烦.有人告诉我,我几乎可以忽略日期,只测试收盘值,将它们标准化,然后根据它们进行预测.

我不断收到此错误:

<块引用>

ValueError: Found array with 0 sample(s) (shape=(0, 1)) while aMinMaxScaler() 要求最小值为 1

老实说,我以前从未使用过 SKLearning 和 TensorFlow,这是我第一次参与这样的项目.我在该主题上看到的所有指南都使用了 Pandas,但就我而言,.csv 文件是一团糟,我不相信我可以使用 Pandas.

我正在遵循指南:>

但不幸的是,由于我缺乏经验,有些事情对我来说并不真正有效,我希望能更清楚地了解我应该如何处理我的情况.

下面附上我的(乱七八糟的)代码:

将pandas导入为pd将 numpy 导入为 np将张量流导入为 tf导入sklearn从 sklearn.model_selection 导入 KFold从 sklearn.preprocessing 导入比例从 sklearn.preprocessing 导入 MinMaxScaler导入 matplotlib导入 matplotlib.pyplot 作为 plt从 dateutil.parser 导入解析从日期时间导入日期时间,时间增量从集合导入双端队列stock_data = []库存日期 = []stock_value = []f = open("s&p500closed.csv","r")数据 = f.read()行 = data.split("\n")rows_noheader = rows[1:len(rows)]#从凌乱的`.csv`中分离值,将每个值放入它的列表以及两者的组合列表对于rows_noheader 中的行:[日期,值] = row[1:len(row)-1].split('\t')stock_date.append(date)stock_value.append((value))stock_data.append((日期,值))#Numpy 数组的所有收盘值转换为浮点数并针对最大值进行标准化stock_value = np.array(stock_value, dtype=np.float32)normvalue = [i/max(stock_value) for i in stock_value]#收盘值和天数.由于每个都有一个结束值,因此它们都匹配并且有 4528 个(每个)nclose_and_days = 0对于范围内的 i(len(stock_data)):nclose_and_days+=1train_data = stock_value[:2264]test_data = stock_value[2264:]缩放器 = MinMaxScaler()train_data = train_data.reshape(-1,1)test_data = test_data.reshape(-1,1)# 用训练数据和平滑数据训练Scaler平滑窗口大小 = 1100对于范围内的 di(0,4400,smoothing_window_size):#这里发生错误scaler.fit(train_data[di:di+smoothing_window_size,:])train_data[di:di+smoothing_window_size,:] = scaler.transform(train_data[di:di+smoothing_window_size,:])# 你规范化剩余数据的最后一位scaler.fit(train_data[di+smoothing_window_size:,:])train_data[di+smoothing_window_size:,:] = scaler.transform(train_data[di+smoothing_window_size:,:])# 重塑训练和测试数据train_data = train_data.reshape(-1)# 规范化测试数据test_data = scaler.transform(test_data).reshape(-1)# 现在执行指数移动平均平滑# 所以数据会比原来的参差不齐的数据有更平滑的曲线EMA = 0.0伽玛 = 0.1对于范围内的 ti(1100):EMA = gamma*train_data[ti] + (1-gamma)*EMAtrain_data[ti] = EMA# 用于可视化和测试目的all_mid_data = np.concatenate([train_data,test_data],axis=0)窗口大小 = 100N = train_data.sizestd_avg_predictions = []std_avg_x = []mse_errors = []对于范围内的 pred_idx(window_size,N):std_avg_predictions.append(np.mean(train_data[pred_idx-window_size:pred_idx]))mse_errors.append((std_avg_predictions[-1]-train_data[pred_idx])**2)std_avg_x.append(date)print('标准平均的 MSE 误差:%.5f'%(0.5*np.mean(mse_errors)))

解决方案

我知道这个帖子很旧,但是当我在这里偶然发现时,其他人会..在遇到同样的问题并在谷歌上搜索了很多之后,我发现了一个帖子https://github.com/llSourcell/Make_Money_with_Tensorflow_2.0/issues/7

所以看起来如果你下载的数据集太小,它会抛出那个错误.下载 1962 年的 .csv 文件,它会足够大;)

现在,我只需要为我的数据集找到正确的参数..因为我正在将其调整为另一种类型的 o 预测..希望能帮到你

This semester I started working with ML. We have only used APIs such as Microsoft's Azure and Amazon's AWS, but we have not gone in depth about how those services work. My good friend, who is a Math major senior, asked me to help him create a stock predictor with TensorFlow based on a .csv the file he provided me.

There are a few problems I have. The first one is his .csv file. The file has only dates and closing values, which are not separated, therefore I had to manually separate the dates and values. I've managed to do that, and now I'm having trouble with the MinMaxScaler(). I was told I could pretty much disregard the dates and only test the closing values, normalize them, and make a prediction based off of them.

I keep getting this error:

ValueError: Found array with 0 sample(s) (shape=(0, 1)) while a minimum of 1 is required by MinMaxScaler()

I honestly have not ever used SKLearning and TensorFlow before, and it is my first time working on such a project. All the guides I see on the topic utilize pandas, but in my case, the .csv file is a mess and I don't believe I can use pandas for it.

I'm following this guide:

But unfortunately, due to my lack of experience, some things are not really working for me, and I would appreciate a little more clarity of how I should proceed in my case.

Attached below is my (messy) code:

import pandas as pd
import numpy as np
import tensorflow as tf
import sklearn
from sklearn.model_selection import KFold
from sklearn.preprocessing import scale
from sklearn.preprocessing import MinMaxScaler
import matplotlib
import matplotlib.pyplot as plt
from dateutil.parser import parse
from datetime import datetime, timedelta
from collections import deque

stock_data = []
stock_date = []
stock_value = []
f = open("s&p500closing.csv","r")
data = f.read()
rows = data.split("\n")
rows_noheader = rows[1:len(rows)]

#Separating values from messy `.csv`, putting each value to it's list and also a combined list of both
for row in rows_noheader:
    [date, value] = row[1:len(row)-1].split('\t')
    stock_date.append(date)
    stock_value.append((value))
    stock_data.append((date, value))

#Numpy array of all closing values converted to floats and normalized against the maximum
stock_value = np.array(stock_value, dtype=np.float32)
normvalue = [i/max(stock_value) for i in stock_value]

#Number of closing values and days. Since there is one closing value for each, they both match and there are 4528 of them (each)
nclose_and_days = 0
for i in range(len(stock_data)):
    nclose_and_days+=1

train_data = stock_value[:2264]
test_data = stock_value[2264:]

scaler = MinMaxScaler()

train_data = train_data.reshape(-1,1)
test_data = test_data.reshape(-1,1)

# Train the Scaler with training data and smooth data
smoothing_window_size = 1100
for di in range(0,4400,smoothing_window_size):
    #error occurs here
    scaler.fit(train_data[di:di+smoothing_window_size,:])
    train_data[di:di+smoothing_window_size,:] = scaler.transform(train_data[di:di+smoothing_window_size,:])

# You normalize the last bit of remaining data
scaler.fit(train_data[di+smoothing_window_size:,:])
train_data[di+smoothing_window_size:,:] = scaler.transform(train_data[di+smoothing_window_size:,:])

# Reshape both train and test data
train_data = train_data.reshape(-1)

# Normalize test data
test_data = scaler.transform(test_data).reshape(-1)

# Now perform exponential moving average smoothing
# So the data will have a smoother curve than the original ragged data
EMA = 0.0
gamma = 0.1
for ti in range(1100):
    EMA = gamma*train_data[ti] + (1-gamma)*EMA
    train_data[ti] = EMA

# Used for visualization and test purposes
all_mid_data = np.concatenate([train_data,test_data],axis=0)

window_size = 100
N = train_data.size
std_avg_predictions = []
std_avg_x = []
mse_errors = []

for pred_idx in range(window_size,N):
    std_avg_predictions.append(np.mean(train_data[pred_idx-window_size:pred_idx]))
    mse_errors.append((std_avg_predictions[-1]-train_data[pred_idx])**2)
    std_avg_x.append(date)

print('MSE error for standard averaging: %.5f'%(0.5*np.mean(mse_errors)))

解决方案

I know that this post is old, but as I stumbled here, others will.. After running in the same problem and googling quite a bit I found a post https://github.com/llSourcell/Make_Money_with_Tensorflow_2.0/issues/7

so it seems that if you download a too small dataset it will throw that error. Download a .csv from 1962 and it'll be big enough ;).

Now,I just have to find the right parameters for my dataset..as I'm adapting this to another type o prediction.. Hope it helps

这篇关于ValueError:找到具有 0 个样本 (s) 的数组(形状 = (0, 1),而 MinMaxScaler 需要最小值为 1的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆