创建自定义估算器:状态均值估算器 [英] Creating a Custom Estimator: State Mean Estimator

查看:101
本文介绍了创建自定义估算器:状态均值估算器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试开发一个非常简单的初始模型,以预测养老院可能根据其位置支付的罚款金额.

I've trying to develop a very simple initial model to predict the amount of fines a nursing home might expect to pay based on its location.

这是我的课程定义

#initial model to predict the amount of fines a nursing home might expect to pay based on its location
from sklearn.base import BaseEstimator, RegressorMixin, TransformerMixin

class GroupMeanEstimator(BaseEstimator, RegressorMixin):
    #defines what a group is by using grouper
    #initialises an empty dictionary for group averages
    def __init__(self, grouper):
        self.grouper = grouper
        self.group_averages = {}

    #Any calculation I require for my predict method goes here
    #Specifically, I want to groupby the group grouper is set by
    #I want to then find out what is the mean penalty by each group
    #X is the data containing the groups
    #Y is fine_totals
    #map each state to its mean fine_tot
    def fit(self, X, y):
        #Use self.group_averages to store the average penalty by group
        Xy = X.join(y) #Joining X&y together
        state_mean_series = Xy.groupby(self.grouper)[y.name].mean() #Creating a series of state:mean penalties
        #populating a dictionary with state:mean key:value pairs
        for row in state_mean_series.iteritems():
            self.group_averages[row[0]] = row[1]
        return self

    #The amount of fine an observation is likely to receive is based on his group mean
    #Want to first populate the list with the number of observations
    #For each observation in the list, what is his group and then set the likely fine to his group mean.
    #Return the list
    def predict(self, X):
        dictionary = self.group_averages
        group = self.grouper
        list_of_predictions = [] #initialising a list to store our return values
        for row in X.itertuples(): #iterating through each row in X
            prediction = dictionary[row.STATE] #Getting the value from group_averages dict using key row.group
            list_of_predictions.append(prediction)
        return list_of_predictions

适用于此 state_model.predict(data.sample(5))

但是在尝试执行此操作时会崩溃: state_model.predict(pd.DataFrame([{'STATE': 'AS'}]))

我的模型无法处理这种可能性,我想寻求帮助来纠正这种可能性.

My model can't handle the possibility, and I would like to seek help in rectifying it.

推荐答案

我看到的问题出在您的fit方法

The problem I am seeing is in your fit method, iteritems basically iterates over columns rather than rows. you should use itertuples which will give you row wise data. just change the loop in your fit method to

for row in pd.DataFrame(state_mean_series).itertuples(): #row format is [STATE, mean_value]
    self.group_averages[row[0]] = row[1]

然后在您的预测方法中,只需执行一次故障安全检查

and then in your predict method, just do a fail safe check by doing

prediction = dictionary.get(row.STATE, None) # None is the default value here in case the 'AS' doesn't exist. you may replace it with what ever you want

这篇关于创建自定义估算器:状态均值估算器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆