插入符号:重新采样的性能度量中存在缺失值 [英] Caret: There were missing values in resampled performance measures

查看:92
本文介绍了插入符号:重新采样的性能度量中存在缺失值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Bike Sharing 数据集上运行 caret 的神经网络,我收到以下错误消息:

I am running caret's neural network on the Bike Sharing dataset and I get the following error message:

在nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,:重新采样的绩效指标中存在缺失值.

In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, : There were missing values in resampled performance measures.

我不确定是什么问题.有人可以帮忙吗?

I am not sure what the problem is. Can anyone help please?

数据集来自:https://archive.ics.uci.edu/ml/datasets/自行车+共享+数据集

代码如下:

library(caret)
library(bestNormalize)

data_hour = read.csv("hour.csv")

# Split dataset
set.seed(3)
split = createDataPartition(data_hour$casual, p=0.80, list=FALSE)    
validation = data_hour[-split,]
dataset = data_hour[split,]
dataset = dataset[,c(-1,-2,-4)]  

# View strucutre of data
str(dataset)

# 'data.frame': 13905 obs. of  14 variables:
# $ season    : int  1 1 1 1 1 1 1 1 1 1 ...
# $ mnth      : int  1 1 1 1 1 1 1 1 1 1 ...
# $ hr        : int  1 2 3 5 8 10 11 12 14 15 ...
# $ holiday   : int  0 0 0 0 0 0 0 0 0 0 ...
# $ weekday   : int  6 6 6 6 6 6 6 6 6 6 ...
# $ workingday: int  0 0 0 0 0 0 0 0 0 0 ...
# $ weathersit: int  1 1 1 2 1 1 1 1 2 2 ...
# $ temp      : num  0.22 0.22 0.24 0.24 0.24 0.38 0.36 0.42 0.46 0.44 ...
# $ atemp     : num  0.273 0.273 0.288 0.258 0.288 ...
# $ hum       : num  0.8 0.8 0.75 0.75 0.75 0.76 0.81 0.77 0.72 0.77 ...
# $ windspeed : num  0 0 0 0.0896 0 ...
# $ casual    : int  8 5 3 0 1 12 26 29 35 40 ...
# $ registered: int  32 27 10 1 7 24 30 55 71 70 ...
# $ cnt       : int  40 32 13 1 8 36 56 84 106 110 ...

## transform numeric data to Guassian
dataset_selected = dataset[,c(-13,-14)]                                                
for (i in 8:12) { dataset_selected[,i] = predict(boxcox(dataset_selected[,i]   +0.1))}  

# View transformed dataset
str(dataset_selected)

#'data.frame':  13905 obs. of  12 variables:
#' $ season    : int  1 1 1 1 1 1 1 1 1 1 ...
#' $ mnth      : int  1 1 1 1 1 1 1 1 1 1 ...
#' $ hr        : int  1 2 3 5 8 10 11 12 14 15 ...
#' $ holiday   : int  0 0 0 0 0 0 0 0 0 0 ...
#' $ weekday   : int  6 6 6 6 6 6 6 6 6 6 ...
#' $ workingday: int  0 0 0 0 0 0 0 0 0 0 ...
#' $ weathersit: int  1 1 1 2 1 1 1 1 2 2 ...
#' $ temp      : num  -1.47 -1.47 -1.35 -1.35 -1.35 ...
#' $ atemp     : num  -1.18 -1.18 -1.09 -1.27 -1.09 ...
#' $ hum       : num  0.899 0.899 0.637 0.637 0.637 ...
#' $ windspeed : num  -1.8 -1.8 -1.8 -0.787 -1.8 ...
#' $ casual    : num  -0.361 -0.588 -0.81 -1.867 -1.208 ...


# Train data with Neural Network model from caret
control = trainControl(method = 'repeatedcv', number = 10, repeats =3)
metric = 'RMSE'
set.seed(3)
fit = train(casual ~., data = dataset_selected, method = 'nnet', metric = metric, trControl = control, trace = FALSE)

感谢您的帮助!

推荐答案

phivers 的评论是正确的,但是我仍然想针对这个具体示例提供更详细的答案.

phivers comment is spot on, however I would still like to provide a more verbose answer on this concrete example.

为了更详细地调查正在发生的事情,应该将参数 savePredictions = "all" 添加到 trainControl:

In order to investigate what is going on in more detail one should add the argument savePredictions = "all" to trainControl:

control = trainControl(method = 'repeatedcv',
                       number = 10,
                       repeats = 3,
                       returnResamp = "all",
                       savePredictions = "all")

metric = 'RMSE'
set.seed(3)
fit = train(casual ~.,
            data = dataset_selected,
            method = 'nnet',
            metric = metric,
            trControl = control,
            trace = FALSE,
            form = "traditional")

现在运行时:

fit$results
#output
  size decay      RMSE  Rsquared       MAE      RMSESD RsquaredSD       MAESD
1    1 0e+00 0.9999205       NaN 0.8213177 0.009655872         NA 0.007919575
2    1 1e-04 0.9479487 0.1850270 0.7657225 0.074211541 0.20380571 0.079640883
3    1 1e-01 0.8801701 0.3516646 0.6937938 0.074484860 0.20787440 0.077960642
4    3 0e+00 0.9999205       NaN 0.8213177 0.009655872         NA 0.007919575
5    3 1e-04 0.9272942 0.2482794 0.7434689 0.091409600 0.24363651 0.098854133
6    3 1e-01 0.7943899 0.6193242 0.5944279 0.011560524 0.03299137 0.013002708
7    5 0e+00 0.9999205       NaN 0.8213177 0.009655872         NA 0.007919575
8    5 1e-04 0.8811411 0.3621494 0.6941335 0.092169810 0.22980560 0.098987058
9    5 1e-01 0.7896507 0.6431808 0.5870894 0.009947324 0.01063359 0.009121535

我们注意到当decay = 0时出现问题.

we notice the problem occurs when decay = 0.

让我们过滤 decay = 0

library(tidyverse)
fit$pred %>%
  filter(decay == 0) -> for_r2

var(for_r2$pred)
#output 
0

我们可以观察到 decay == 0 时的所有预测都是相同的(方差为零).该模型仅预测 0:

we can observe that all of the predictions when decay == 0 are the same (have zero variance). The model exclusively predicts 0:

unique(for_r2$pred)
#output 
0

所以当汇总函数试图预测 R 平方时:

So when the summary function tries to predict R squared:

caret::R2(for_r2$obs, for_r2$pred)
#output
[1] NA
Warning message:
In cor(obs, pred, use = ifelse(na.rm, "complete.obs", "everything")) :
  the standard deviation is zero

这篇关于插入符号:重新采样的性能度量中存在缺失值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆