ML模型无法估算值 [英] ML model is failing to impute values
问题描述
我试图创建一个ML模型来做出一些预测,但是我一直遇到绊脚石.即,代码似乎忽略了我给它的插补指令,从而导致以下错误:
I've tried creating an ML model to make some predictions, but I keep running into a stumbling block. Namely, the code seems to be ignoring the imputation instructions I give it, resulting in the following error:
ValueError:输入包含NaN,无穷大或对于dtype('float64')而言太大的值.
这是我的代码:
import pandas as pd
import numpy as np
from sklearn.ensemble import AdaBoostRegressor
from category_encoders import CatBoostEncoder
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.impute import SimpleImputer
data = pd.read_csv("data.csv",index_col=("Unnamed: 0"))
y = data.Installs
x = data.drop("Installs",axis=1)
strat = ["mean","median","most_frequent","constant"]
num_imp = SimpleImputer(strategy=strat[0])
obj_imp = SimpleImputer(strategy=strat[2])
# Set up the scaler
sc = StandardScaler()
# Set up Encoders
cb = CatBoostEncoder()
oh = OneHotEncoder(sparse=True)
# Set up columns
obj = list(x.select_dtypes(include="object"))
num = list(x.select_dtypes(exclude="object"))
cb_col = [i for i in obj if len(x[i].unique())>30]
oh_col = [i for i in obj if len(x[i].unique())<10]
# First Pipeline
imp = make_pipeline((num_imp))
enc_cb = make_pipeline((obj_imp),(cb))
enc_oh = make_pipeline((obj_imp),(oh))
# Col Transformation
col = make_column_transformer((imp,num),
(sc,num),
(enc_oh,oh_col),
(enc_cb,cb_col))
model = AdaBoostRegressor(random_state=(0))
run = make_pipeline((col),(model))
run.fit(x,y)
这是链接在代码中用于复制目的.你能说出什么问题吗?谢谢您的宝贵时间.
And here's a link to the data used in the code for reproduction purposes. Can you tell what's wrong? Thanks for your time.
推荐答案
您的数字缩放转换器可能是一个抱怨:您没有在应用 StandardScaler
之前进行估算.可能您想要这样的东西:
Your numeric scaling transformer is probably the one complaining: you haven't imputed before the StandardScaler
is applied. Probably you wanted something like this:
imp_sc = make_pipeline((num_imp),(sc))
# Col Transformation
col = make_column_transformer((imp_sc,num),
(enc_oh,oh_col),
(enc_cb,cb_col))
这篇关于ML模型无法估算值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!