scikit-learn:如何以百分比计算均方根误差 (RMSE)? [英] scikit-learn: How to calculate root-mean-square error (RMSE) in percentage?

查看:601
本文介绍了scikit-learn:如何以百分比计算均方根误差 (RMSE)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集(可在此链接中找到:https://drive.google.com/open?id=0B2Iv8dfU4fTUY2ltNGVkMG05V00) 以下格式.

I have a dataset (found in this link: https://drive.google.com/open?id=0B2Iv8dfU4fTUY2ltNGVkMG05V00) of the following format.

 time     X   Y
0.000543  0  10
0.000575  0  10
0.041324  1  10
0.041331  2  10
0.041336  3  10
0.04134   4  10
  ...
9.987735  55 239
9.987739  56 239
9.987744  57 239
9.987749  58 239
9.987938  59 239

我数据集中的第三列 (Y) 是我的真实值 - 这就是我想要预测(估计)的值.我想做一个Y的预测(即根据X的前100个滚动值预测Y的当前值.为此,我有以下 python 脚本使用 随机森林回归模型.

The third column (Y) in my dataset is my true value - that's what I wanted to predict (estimate). I want to do a prediction of Y (i.e. predict the current value of Y according to the previous 100 rolling values of X. For this, I have the following python script work using random forest regression model.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""

@author: deshag
"""

import pandas as pd
import numpy as np
from io import StringIO
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
from math import sqrt



df = pd.read_csv('estimated_pred.csv')

for i in range(1,100):
    df['X_t'+str(i)] = df['X'].shift(i)

print(df)

df.dropna(inplace=True)


X=pd.DataFrame({ 'X_%d'%i : df['X'].shift(i) for i in range(100)}).apply(np.nan_to_num, axis=0).values


y = df['Y'].values


reg = RandomForestRegressor(criterion='mse')
reg.fit(X,y)
modelPred = reg.predict(X)
print(modelPred)

print("Number of predictions:",len(modelPred))

meanSquaredError=mean_squared_error(y, modelPred)
print("MSE:", meanSquaredError)
rootMeanSquaredError = sqrt(meanSquaredError)
print("RMSE:", rootMeanSquaredError)

最后,我测量了均方根误差 (RMSE) 并得到 19.57RMSE.从我从文档中读到的内容来看,平方误差的单位与响应的单位相同.有没有办法以百分比形式显示 RMSE 的值?例如,说这个百分比的预测是正确的,而这个百分比是错误的.

At the end, I measured the root-mean-square error (RMSE) and got an RMSE of 19.57. From what I have read from the documentation, it says that squared errors have the same units as of the response. Is there any way to present the value of an RMSE in percentage? For example, to say this percent of the prediction is correct and this much wrong.

在最新版本的 sklearn 中有一个 check_array 函数用于计算 平均绝对百分比误差 (MAPE) 但它似乎没有当我如下尝试时,以与以前版本相同的方式工作.

There is a check_array function for calculating mean absolute percentage error (MAPE) in the recent version of sklearn but it doesn't seem to work the same way as the previous version when i try it as in the following.

import numpy as np
from sklearn.utils import check_array

def calculate_mape(y_true, y_pred): 
y_true, y_pred = check_array(y_true, y_pred)

    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

calculate_mape(y, modelPred)

这是返回一个错误:ValueError: not enough values to unpack (expected 2, got 1).这似乎是最近版本中的 check_array 函数只返回一个 单值,与之前的版本不同.

This is returning an error: ValueError: not enough values to unpack (expected 2, got 1). And this seems to be that the check_array function in the recent version returns only a single value, unlike the previous version.

有没有办法以百分比形式显示 RMSE 或使用 sklearn for Python 计算 MAPE ?

Is there any way to present the RMSE in percentage or calculate MAPE using sklearn for Python?

推荐答案

您对 calculate_mape 的实现不起作用,因为您期望使用 check_arrays 函数,该函数已在sklearn 0.16.check_array 不是你想要的.

Your implementation of calculate_mape is not working because you are expecting the check_arrays function, which was removed in sklearn 0.16. check_array is not what you want.

这个 StackOverflow 答案给出了一个有效的实现.

This StackOverflow answer gives a working implementation.

这篇关于scikit-learn:如何以百分比计算均方根误差 (RMSE)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆