如何使用遗传算法优化参数 [英] How to optimize parameters using genetic algorithms

查看:1052
本文介绍了如何使用遗传算法优化参数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用R中的GA优化eps回归(SVR)中的三个参数(伽玛,成本和epsilon).这是我所做的.

I'd like to optimize three parameters(gamma, cost and epsilon) in eps-regression(SVR) using GA in R. Here's what I've done.

library(e1071)
data(Ozone, package="mlbench")
a<-na.omit(Ozone)
index<-sample(1:nrow(a), trunc(nrow(a)/3))
trainset<-a[index,]
testset<-a[-index,]
model<-svm(V4 ~ .,data=trainset, cost=0.1, gamma=0.1, epsilon=0.1, type="eps-regression", kernel="radial")
error<-model$residuals
rmse <- function(error) #root mean sqaured error
{
  sqrt(mean(error^2))
}
rmse(error)

在这里,我将成本,伽玛值和epsilon分别设置为0.1,但我认为它们不是最好的值.因此,我想采用遗传算法来优化这些参数.

Here, I set cost,gamma and epsilon to be 0.1 respectively, but I don't think they are the best value. So, I'd like to employ Genetic Algorithm to optimize these parameters.

GA <- ga(type = "real-valued", fitness = rmse,
         min = c(0.1,3), max = c(0.1,3),
         popSize = 50, maxiter = 100)

在这里,我将RMSE用作健身功能.但我认为适应度函数必须包含要优化的参数.但是,在SVR中,目标函数太复杂了,无法用R代码写出来,我试图花很长时间找到它,但无济于事.同时了解SVR和GA的人,有使用GA优化SVR参数的经验的人,请帮助我.请.

Here, I used RMSE as the fitness function. but I think fitness function has to include the parameters that is to be optimized. But, in SVR, the objective function is too complicated to write out with R code, which I tried to find for a LONG time but to no avail. Someone who knows SVR and GA at the same time, someone who has a experience of optimizing SVR parameters using GA, Please help me. please.

推荐答案

在这样的应用程序中,一个将要优化其值的参数(在您的情况下为costgammaepsilon)传递为适应度函数的参数,然后运行模型拟合+评估函数,并使用模型性能度量作为适应度度量.因此,目标函数的显式形式并不直接相关.

In such an application, one passes the parameters whose values are to be optimized (in your case, cost, gamma and epsilon) as parameters of the fitness function, which then runs the model fitting + evaluation function and uses a measure of model performance as a measure of fitness. Therefore, the explicit form of the objective function is not directly relevant.

在下面的实现中,我使用5倍交叉验证来估算给定参数集的RMSE.特别是,由于包GA使适应度函数最大化,因此我写出了给定参数值的适应度值,以减去交叉验证数据集的平均均方根值.因此,可以达到的最大适应度为零.

In the implementation below, I used 5-fold cross-validation to estimate the RMSE for a given set of parameters. In particular, since package GA maximizes the fitness function, I have written the fitness value for a given value of the parameters as minus the average rmse over the cross-validation datasets. Hence, the maximum fitness that can be attained is zero.

这里是:

library(e1071)
library(GA)

data(Ozone, package="mlbench")
Data <- na.omit(Ozone)

# Setup the data for cross-validation
K = 5 # 5-fold cross-validation
fold_inds <- sample(1:K, nrow(Data), replace = TRUE)
lst_CV_data <- lapply(1:K, function(i) list(
    train_data = Data[fold_inds != i, , drop = FALSE], 
    test_data = Data[fold_inds == i, , drop = FALSE]))

# Given the values of parameters 'cost', 'gamma' and 'epsilon', return the rmse of the model over the test data
evalParams <- function(train_data, test_data, cost, gamma, epsilon) {
    # Train
    model <- svm(V4 ~ ., data = train_data, cost = cost, gamma = gamma, epsilon = epsilon, type = "eps-regression", kernel = "radial")
    # Test
    rmse <- mean((predict(model, newdata = test_data) - test_data$V4) ^ 2)
    return (rmse)
}

# Fitness function (to be maximized)
# Parameter vector x is: (cost, gamma, epsilon)
fitnessFunc <- function(x, Lst_CV_Data) {
    # Retrieve the SVM parameters
    cost_val <- x[1]
    gamma_val <- x[2]
    epsilon_val <- x[3]

    # Use cross-validation to estimate the RMSE for each split of the dataset
    rmse_vals <- sapply(Lst_CV_Data, function(in_data) with(in_data, 
        evalParams(train_data, test_data, cost_val, gamma_val, epsilon_val)))

    # As fitness measure, return minus the average rmse (over the cross-validation folds), 
    # so that by maximizing fitness we are minimizing the rmse
    return (-mean(rmse_vals))
}

# Range of the parameter values to be tested
# Parameters are: (cost, gamma, epsilon)
theta_min <- c(cost = 1e-4, gamma = 1e-3, epsilon = 1e-2)
theta_max <- c(cost = 10, gamma = 2, epsilon = 2)

# Run the genetic algorithm
results <- ga(type = "real-valued", fitness = fitnessFunc, lst_CV_data, 
    names = names(theta_min), 
    min = theta_min, max = theta_max,
    popSize = 50, maxiter = 10)

summary(results)

产生结果(对于我指定的参数值范围,可能需要根据数据进行微调):

which produces the results (for the range of parameter values that I specified, which may require fine-tuning based on the data):

GA results: 
Iterations             = 100 
Fitness function value = -14.66315 
Solution               = 
         cost      gamma    epsilon
[1,] 2.643109 0.07910103 0.09864132

这篇关于如何使用遗传算法优化参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆