如何使用遗传算法优化参数 [英] How to optimize parameters using genetic algorithms
问题描述
我想使用R中的GA优化eps回归(SVR)中的三个参数(伽玛,成本和epsilon).这是我所做的.
I'd like to optimize three parameters(gamma, cost and epsilon) in eps-regression(SVR) using GA in R. Here's what I've done.
library(e1071)
data(Ozone, package="mlbench")
a<-na.omit(Ozone)
index<-sample(1:nrow(a), trunc(nrow(a)/3))
trainset<-a[index,]
testset<-a[-index,]
model<-svm(V4 ~ .,data=trainset, cost=0.1, gamma=0.1, epsilon=0.1, type="eps-regression", kernel="radial")
error<-model$residuals
rmse <- function(error) #root mean sqaured error
{
sqrt(mean(error^2))
}
rmse(error)
在这里,我将成本,伽玛值和epsilon分别设置为0.1,但我认为它们不是最好的值.因此,我想采用遗传算法来优化这些参数.
Here, I set cost,gamma and epsilon to be 0.1 respectively, but I don't think they are the best value. So, I'd like to employ Genetic Algorithm to optimize these parameters.
GA <- ga(type = "real-valued", fitness = rmse,
min = c(0.1,3), max = c(0.1,3),
popSize = 50, maxiter = 100)
在这里,我将RMSE用作健身功能.但我认为适应度函数必须包含要优化的参数.但是,在SVR中,目标函数太复杂了,无法用R代码写出来,我试图花很长时间找到它,但无济于事.同时了解SVR和GA的人,有使用GA优化SVR参数的经验的人,请帮助我.请.
Here, I used RMSE as the fitness function. but I think fitness function has to include the parameters that is to be optimized. But, in SVR, the objective function is too complicated to write out with R code, which I tried to find for a LONG time but to no avail. Someone who knows SVR and GA at the same time, someone who has a experience of optimizing SVR parameters using GA, Please help me. please.
推荐答案
在这样的应用程序中,一个将要优化其值的参数(在您的情况下为cost
,gamma
和epsilon
)传递为适应度函数的参数,然后运行模型拟合+评估函数,并使用模型性能度量作为适应度度量.因此,目标函数的显式形式并不直接相关.
In such an application, one passes the parameters whose values are to be optimized (in your case, cost
, gamma
and epsilon
) as parameters of the fitness function, which then runs the model fitting + evaluation function and uses a measure of model performance as a measure of fitness. Therefore, the explicit form of the objective function is not directly relevant.
在下面的实现中,我使用5倍交叉验证来估算给定参数集的RMSE.特别是,由于包GA
使适应度函数最大化,因此我写出了给定参数值的适应度值,以减去交叉验证数据集的平均均方根值.因此,可以达到的最大适应度为零.
In the implementation below, I used 5-fold cross-validation to estimate the RMSE for a given set of parameters. In particular, since package GA
maximizes the fitness function, I have written the fitness value for a given value of the parameters as minus the average rmse over the cross-validation datasets. Hence, the maximum fitness that can be attained is zero.
这里是:
library(e1071)
library(GA)
data(Ozone, package="mlbench")
Data <- na.omit(Ozone)
# Setup the data for cross-validation
K = 5 # 5-fold cross-validation
fold_inds <- sample(1:K, nrow(Data), replace = TRUE)
lst_CV_data <- lapply(1:K, function(i) list(
train_data = Data[fold_inds != i, , drop = FALSE],
test_data = Data[fold_inds == i, , drop = FALSE]))
# Given the values of parameters 'cost', 'gamma' and 'epsilon', return the rmse of the model over the test data
evalParams <- function(train_data, test_data, cost, gamma, epsilon) {
# Train
model <- svm(V4 ~ ., data = train_data, cost = cost, gamma = gamma, epsilon = epsilon, type = "eps-regression", kernel = "radial")
# Test
rmse <- mean((predict(model, newdata = test_data) - test_data$V4) ^ 2)
return (rmse)
}
# Fitness function (to be maximized)
# Parameter vector x is: (cost, gamma, epsilon)
fitnessFunc <- function(x, Lst_CV_Data) {
# Retrieve the SVM parameters
cost_val <- x[1]
gamma_val <- x[2]
epsilon_val <- x[3]
# Use cross-validation to estimate the RMSE for each split of the dataset
rmse_vals <- sapply(Lst_CV_Data, function(in_data) with(in_data,
evalParams(train_data, test_data, cost_val, gamma_val, epsilon_val)))
# As fitness measure, return minus the average rmse (over the cross-validation folds),
# so that by maximizing fitness we are minimizing the rmse
return (-mean(rmse_vals))
}
# Range of the parameter values to be tested
# Parameters are: (cost, gamma, epsilon)
theta_min <- c(cost = 1e-4, gamma = 1e-3, epsilon = 1e-2)
theta_max <- c(cost = 10, gamma = 2, epsilon = 2)
# Run the genetic algorithm
results <- ga(type = "real-valued", fitness = fitnessFunc, lst_CV_data,
names = names(theta_min),
min = theta_min, max = theta_max,
popSize = 50, maxiter = 10)
summary(results)
产生结果(对于我指定的参数值范围,可能需要根据数据进行微调):
which produces the results (for the range of parameter values that I specified, which may require fine-tuning based on the data):
GA results:
Iterations = 100
Fitness function value = -14.66315
Solution =
cost gamma epsilon
[1,] 2.643109 0.07910103 0.09864132
这篇关于如何使用遗传算法优化参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!