输入数据不能是列表XGBoost [英] Input data cannot be a list XGBoost

查看:375
本文介绍了输入数据不能是列表XGBoost的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的代码.

import pandas as pd
import numpy as np
import json
from xgboost import XGBRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_squared_error
from sklearn.preprocessing import StandardScaler

training_data = pd.read_csv('/Users/aus10/Desktop/MLB_Data/Test_Training_Data/MLB_Training_Data.csv')
df_model = training_data.copy()
scaler = StandardScaler()

features = [['OBS', 'Runs']]
for feature in features:
    df_model[feature] = scaler.fit_transform(df_model[feature])

test_data = pd.read_csv('/Users/aus10/Desktop/MLB_Data/Test_Training_Data/Test_Data.csv')
X = training_data.iloc[:,1]  #independent columns
y = training_data.iloc[:,-1]   #target column 
X = X.values.reshape(-1,1)

results = []

# fit final model
model = XGBRegressor(objective="reg:squarederror", random_state=42)
model.fit(X, y)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=4)

y_train_pred = model.predict(X_train)
y_test_pred = model.predict(X_test)

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print('MSE train: %.3f, test: %.3f' % (
    round(mean_squared_error(y_train, y_train_pred),2),
    round(mean_squared_error(y_test, y_test_pred),2)
))

print('R^2 train: %.3f, test: %.3f' % (r2_score(y_train, y_train_pred), r2_score(y_test, y_test_pred)))

# define one new data instance

index = 0
count = 0

while count < len(test_data):
    team = test_data.loc[index].at['Team']
    OBS = test_data.loc[index].at['OBS']

    Xnew = [[ OBS ]]
    # make a prediction
    ynew = model.predict(Xnew)
    # show the inputs and predicted outputs
    results.append(
        {
            'Team': team,
            'Runs': (round(ynew[0],2))
        })
    index += 1
    count += 1
    
sorted_results = sorted(results, key=lambda k: k['Runs'], reverse=True)

df = pd.DataFrame(sorted_results, columns=[
    'Team', 'Runs'])
writer = pd.ExcelWriter('/Users/aus10/Desktop/MLB_Data/ML/Results/Projected_Runs_XGBoost.xlsx', engine='xlsxwriter') # pylint: disable=abstract-class-instantiated
df.to_excel(writer, sheet_name='Sheet1', index=False)
df.style.set_properties(**{'text-align': 'center'})
pd.set_option('display.max_colwidth', 100)
pd.set_option('display.width', 1000)
writer.save()

我得到的错误是 TypeError:输入数据不能是列表.

来自 test_data 的数据是具有团队名称和obs的csv,它是浮点数像这样 NYY 0.324

The data coming from test_data is a csv with a team name and obs which is a float like this NYY 0.324

我所见过的每种解决方法都是像我一样将其放入二维数组中- Xnew = [[OBS]] ,但我仍然遇到错误.

Every way to solve it I've seen is just to put it in a 2d array like I did - Xnew = [[ OBS ]], but I'm still getting the error.

我还需要对传入的test_data做些其他的事情吗?我尝试使用 values.reshape ,但这也不能解决问题.

Is there something else I need to do to the test_data coming in? I tried using values.reshape, but that didn't fix it either.

推荐答案

你需要转换你的Xnew:

Xnew = np.array(Xnew).reshape((1,-1))

这篇关于输入数据不能是列表XGBoost的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆