使用正整数参数进行优化 [英] Optimization with positive integer parameters

查看:90
本文介绍了使用正整数参数进行优化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要解决一个问题,即比较具有相同列数的两个矩阵.操作其中之一,直到获得最佳匹配为止.我对两个矩阵之间的差异进行评分的方式非常复杂,我仍然必须对其进行最终确定.目前,我真正感兴趣的是找到仅适用于正整数的搜索/优化算法.我创建了一个简单的示例,并使用一个简单的函数来最大化它.假设我有一个数据集D.

I need to solve a problem which entails comparing two matrices with the same number of columns. One of these is manipulated until the best match is obtained. The way I score the differences between the two matrices is quite convoluted and I still have to finalize it. What I'm really interested at the moment in is finding a search/optimization algorithm that works with positive integers only. I've created a simple example with a simple function to maximise. Let's say I have a dataset D.

 D <- data.frame(rbind(c(1,1,1),
                       c(1,1,0),c(1,1,0),c(1,1,0),c(1,0,0),
                       c(0,0,0),c(1,0,0),c(1,0,0),c(1,1,0),
                       c(1,0,0),c(1,1,1),c(1,1,0),c(1,0,0),
                       c(1,0,0),c(1,0,1)))

我想找到Dx的重新排列给我的绝对差值最小.

I want to find which re-arrangement of Dx gives me the lowest absolute difference.

Dx<-data.frame(rbind(c(1,1,0),c(1,0,0),c(0,0,0),c(1,1,0)))

所以我可以使用下面的函数浏览所有可能的排列

So I could go through all the possible permutations using the function below

    library(combinat)
    SPACE <- t(as.data.frame(list(permn(1:3))))
    f <- function(x){
      if(anyDuplicated(x)>0){return(0)}
      Dist<-NA
      for (i in 1:nrow(D)){
        Dist[i]<-sum(abs(Dx[,x]-t(D[i,])))} 
    return(sum(Dist))}
apply(SPACE,1,f)

并获得正确的结果.但是,这对于我实际使用的数据有两个缺点:

and get the right result.However this has 2 disadvantages for the data I'm actually using:

  1. 我必须指定SPACE-所有可能的列顺序和
  2. apply遍历每个可能的排列并计算我的错误评分.
  1. I have to specify SPACE- all the possible column orders and
  2. apply goes through each possible permutations and calculates my error score.

随着矩阵中列数的增加,A和B都在计算上变得困难.我认为在大多数计算机上,即使在一个R会话中保留数字1到14的所有可能排列也是不可能的.

Both A and B become computationally difficult as the number of columns in my matrix increases. I think even keeping all the possible permutations of the numbers 1 to 14 in one R session is impossible on most computers.

我发现的一种优化算法是网格搜索.这开始解决A.这意味着我不必指定SPACE(即所有可能的置换),因此这是朝正确方向迈出的一步,因为我想查看更大的数据集.

An optimization algorithm I found is grid search. This starts to address A. It means that I don't have to specify SPACE (i.e. all the possible permuatations), so it's one step in the right direction, as I want to look at much larger datasets.

library(NMOF)
gridSearch(f, rep(list(seq(1,ncol(D))),ncol(D)))

但是很明显,这并没有针对B,因为它经历了每个可能的迭代.如果我的数据集非常大,比如说15列甚至更多列,该怎么办?

But obviously this does not address B, as it goes through each possible iteration. What if my dataset was very large, let's say 15 or even more columns?

请记住,我的参数只能是正整数(即,它们是列号),是否有一种R算法可以让我在合理的范围内找到最佳的列顺序(或至少近似值)?时间(例如1-2天),何时要处理更大的数据集?这可能看起来像一个愚蠢的示例,但是它很好地模仿了我要解决的问题.我已经尝试过将optim()method="SANN"一起使用,但是没有结果.不幸的是,我的经验很少,所以如果您认为这是一个无法解决的问题,请告诉我.只是从一个简单的数据集(行数少但列数多)问题开始,您认为通过使用某种巧妙的优化方法就能找到如上所示的D2的最佳列顺序吗?

Keeping in mind that my parameters can only be positive integers (i.e. they are column numbers), is there an R algorithm that would allow me to find the best column order (or at least a good approximation) within a reasonable amount of time (e.g. 1-2 days), when I'm dealing with much larger datasets? This may look like a silly example, but it emulates very well the problem I'm trying to solve. I've tried optim() with method="SANN", but got nowhere. Unfortunately I have very little experience so do let me know if you think this is an unworkable problem. Just to start with an easier dataset (few rows but lots of columns) problem, do you think it's possible to find the best column order as shown above for D2 by using some kind of clever optimization?

   #D2
D<-cbind(D,D,D,D,D)
ncol(D)
Dx<-cbind(Dx,Dx,Dx,Dx,Dx)
#examples 
f(c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15))
f(c(13,2,4,3,5,6,7,8,9,10,11,12,1,14,15))

我的主要兴趣是了解如何使用在搜索过程中使用一系列唯一正整数(基本上是秩)的优化算法,而不是解决这个特定问题.在这种情况下,我使用了一个简单的示例,因此很容易复制,但是我正在比较的两个数据集的行数和其他方面经常有所不同,在此不做详细介绍....距离函数m建筑物处理得很好,因此,目前我的主要问题是了解如何使用D2将优化算法(例如下面建议的遗传算法)应用于上述函数f.

My main interest is in understanding how to use optimization algorithms that use a series of unique positive integrals (basically ranks) in the search process, rather than solving this particular problem. I've used a simple example in this case so that it's easy to replicate, but the two datasets I'm comparing often differ in number of rows and other aspects which I'm not detailing here....the distance function I'm building handles this well so understanding how to apply an optimization algorithm (e.g. Genetic Algorithm was suggested below) to the function f above using D2 is therefore my main problem at the moment.

推荐答案

如果您的目标函数f必须确实被视为黑匣子,那么我们将需要采用近似方法,例如遗传算法.这是使用gaoptim包的解决方案,它在Dx列的所有排列p中最大化f(p):

If your objective function f must truly be seen as a black box, then we'll need to resort to approximate approaches such as a genetic algorithm. Here is a solution using the gaoptim package, which is maximizing f(p) among all permutations p of the columns of Dx:

library(gaoptim)
myGA = GAPerm(f, ncol(Dx), popSize=10)
myGA$evolve(10)
myGA
# Results for 10 Generations:
# Mean Fitness:
#    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#    95.0   107.4   115.6   112.4   118.3   120.6 
# 
# Best Fitness:
#    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#     125     125     125     125     125     125
# 
# Best individual:
# [1] 3 1 2
# 
# Best fitness value:
# [1] 125

在这种情况下,它找到了可能的最佳解决方案,目标值为125,尽管通常不能保证遗传算法将返回的解决方案的质量.

In this case it found the best possible solution, with objective value 125, though in general there are no guarantees about the quality of the solution that will be returned by a genetic algorithm.

这篇关于使用正整数参数进行优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆