preProc = c("center", "scale") 在插入符号包 (R) 和最小-最大归一化中的含义 [英] preProc = c("center", "scale") meaning in caret's package (R) and min-max normalization
问题描述
我想知道如何在 caret
的 train()
函数中使用 preProc
.我正在使用 neuralnet
在 train()
函数中运行神经网络.代码来自
I am wondering how preProc
can be used within the train()
function of caret
. I am running a neural network in the train()
function using neuralnet
. The code comes from this question.
This is actually the code:
nn <- train(medv ~ .,
data = df,
method = "neuralnet",
tuneGrid = grid,
metric = "RMSE",
preProc = c("center", "scale", "nzv"), #good idea to do this with neural nets - your error is due to non scaled data
trControl = trainControl(
method = "cv",
number = 5,
verboseIter = TRUE)
)
The original data is not scaled, so that it is recommended to scale the data before running the neural network.
However, in the argument preProc
appears three elements: center
, scale
, nzv
. I am having problems interpreting those values, as I do not know why they are present. Furthermore, I would like to scale/normalize my data using min-max. This would be the function:
maxs = apply(pk_dc_only$C, 2, max)
mins = apply(pk_dc_only$C, 2, min)
scaled = as.data.frame(scale(df, center = mins, scale = maxs - mins))
Is it possible to normalize my data using min-max scaling within preProc
?
And if so, how could I undo the scaling when predicting?
The three options c("center", "scale", "nzv") does scale and center, in the vignette:
method = "center" subtracts the mean of the predictor's data (again from the data in x) from the predictor values while method = "scale" divides by the standard deviation.
And nzv
basically excludes variables that have near zero variance, meaning they are almost constant and most likely not useful for prediction. To do min max, there is an option:
The "range" transformation scales the data to be within ‘rangeBounds’. If new samples have values larger or smaller than those in the training set, values will be outside of this range.
we try it below:
library(mlbench)
data(BostonHousing)
library(caret)
idx = sample(nrow(BostonHousing),400)
df = BostonHousing[idx,]
df$chas = as.numeric(df$chas)
pre_mdl = preProcess(df,method="range")
nn <- train(medv ~ ., data = predict(pre_mdl,df),
method = "neuralnet",tuneGrid=G,
metric = "RMSE",trControl = trainControl(
method = "cv",number = 5,verboseIter = TRUE))
nn$preProcess
Created from 400 samples and 13 variables
Pre-processing:
- ignored (0)
- re-scaling to [0, 1] (13)
summary(nn$finalModel$data)
crim zn indus chas
Min. :0.000000 Min. :0.0000 Min. :0.0000 Min. :0.0000
1st Qu.:0.000821 1st Qu.:0.0000 1st Qu.:0.1646 1st Qu.:0.0000
Median :0.002454 Median :0.0000 Median :0.2969 Median :0.0000
Mean :0.042130 Mean :0.1309 Mean :0.3804 Mean :0.0625
3rd Qu.:0.039150 3rd Qu.:0.2000 3rd Qu.:0.6466 3rd Qu.:0.0000
Max. :1.000000 Max. :1.0000 Max. :1.0000 Max. :1.0000
nox rm age dis
Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.00000
1st Qu.:0.1276 1st Qu.:0.4470 1st Qu.:0.4032 1st Qu.:0.08522
Median :0.2819 Median :0.5076 Median :0.7503 Median :0.20133
Mean :0.3363 Mean :0.5232 Mean :0.6647 Mean :0.25146
3rd Qu.:0.4918 3rd Qu.:0.5880 3rd Qu.:0.9361 3rd Qu.:0.38622
Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.00000
rad tax ptratio b
Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
1st Qu.:0.1304 1st Qu.:0.1770 1st Qu.:0.5106 1st Qu.:0.9475
Median :0.1739 Median :0.2729 Median :0.6862 Median :0.9861
Mean :0.3676 Mean :0.4171 Mean :0.6243 Mean :0.8987
3rd Qu.:1.0000 3rd Qu.:0.9141 3rd Qu.:0.8085 3rd Qu.:0.9983
Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
lstat .outcome
Min. :0.0000 Min. :0.0000
1st Qu.:0.1492 1st Qu.:0.2683
Median :0.2705 Median :0.3644
Mean :0.3069 Mean :0.3902
3rd Qu.:0.4220 3rd Qu.:0.4450
Max. :1.0000 Max. :1.0000
Not very sure what you mean by "undo the scaling when predicting". Maybe you meant translating them back to the original scale:
test = BostonHousing[-idx,]
test$chas = as.numeric(test$chas)
test_medv = test$medv
test = predict(pre_mdl,test)
The range is stored under the preProcess model, under
pre_mdl$ranges
crim zn indus chas nox rm age dis rad tax ptratio b
[1,] 0.00632 0 0.46 1 0.385 3.561 2.9 1.1691 1 187 12.6 0.32
[2,] 88.97620 100 27.74 2 0.871 8.780 100.0 12.1265 24 711 22.0 396.90
lstat medv
[1,] 1.73 5
[2,] 36.98 50
So we write a wrapper:
convert_response = function(value,mdl,method,column){
bounds = mdl[[method]][,column]
value*diff(bounds) + min(bounds)
}
plot(test_medv,convert_response(predict(nn,test),pre_mdl,"ranges","medv"),
ylab="predicted")
这篇关于preProc = c("center", "scale") 在插入符号包 (R) 和最小-最大归一化中的含义的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!