如何用scipy中的离散变量值最小化一个函数 [英] how to minimize a function with discrete variable values in scipy

查看:159
本文介绍了如何用scipy中的离散变量值最小化一个函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试优化具有多个输入变量(在24到30之间)的目标函数.这些变量是三个不同统计变量的样本,目标函数值是t检验概率值.误差函数表示所需的t检验概率与实际t检验概率之间的误差(差的平方和).对于所有三个t检验,我只能接受误差小于1e-8的解决方案.

I'm trying to optimize a target function that has multiple input variables (between 24 and 30). These variables are samples of three different statistical variables, and target function values are t-test probability values. An error function represents the error (sum of squares of differences) between the desired and the actual t-test probabilities. I can only accept solutions where the error is less than 1e-8, for all of the three t-tests.

我正在使用scipy.optimize.fmin,效果很好.在许多解决方案中,目标函数为零.

I was using scipy.optimize.fmin and it worked great. There are many solutions where the target function became zero.

问题是我需要找到一个解决方案,其中变量在0到10.0之间,并且是整数或不超过一个数字的小数部分.有效值的示例是0 10 3 5.5 6.8.无效值的示例:-3 2.23 300.16666667.

The problem is that I need to find a solution where the variables are between 0 and 10.0, and are whole numbers or don't have more than one digit fractional part. Examples of valid values are 0 10 3 5.5 6.8. Examples of invalid values: -3 2.23 30or 0.16666667.

我偶然知道至少有一种解决方案,因为目标值来自实际的测量数据.原始数据丢失了,我的任务是找到它们.但是我不知道如何.不能选择使用试验/错误,因为每个变量大约有100个可能的值,并且在给定变量数量的情况下,可能的案例数量将是100 ** 30,这太多了.使用fmin很好,但是不适用于谨慎的值.

I happen to know that there is at least one solution, because the target values are coming from actual measured data. The original data was lost, and my task is to find them. But I don't know how. Using trial/error is not an option, because there are about 100 possible values for each variable, and given the number of variables, the number of possible cases would be 100**30 which is too much. Using fmin is great, however it does not work with discreet values.

有没有办法解决这个问题?如果我需要运行多个小时才能找到解决方案,这不是问题.但是我需要在几天内找到大约10个目标值的解决方案,而且我没有新的想法.

Is there a way to solve this? It is not a problem if I need to run a program for many hours to find a solution. But I need to find solutions for about 10 target values within a few days, and I'm out of new ideas.

以下是MWE的示例:

import math
import numpy
import scipy.optimize
import scipy.stats
import sys

def log(s):
    sys.stdout.write(str(s))
    sys.stdout.flush()

# List of target T values: TAB, TCA, TCB
TARGETS = numpy.array([
    [0.05456834,   0.01510358,    0.15223353   ],  # task 1 to solve
    [0.15891875,   0.0083665,     0.00040262   ],  # task 2 to solve
])
MAX_ERR = 1e-10 # Maximum error in T values
NMIN,NMAX = 8,10 # Number of samples for T probes. Inclusive.

def fsq(x, t, n):
    """Returns the differences between the target and the actual values."""
    a,b,c = x[0:n],x[n:2*n],x[2*n:3*n]
    results = numpy.array([
        scipy.stats.ttest_rel(a,b)[1], # ab
        scipy.stats.ttest_rel(c,a)[1], # ca
        scipy.stats.ttest_rel(c,b)[1]  # cb
    ])
    # Sum of squares of diffs
    return (results - t)

def f(x, t, n):
    """This is the target function that needs to be minimized."""
    return (fsq(x,t,n)**2).sum()

def main():
    for tidx,t in enumerate(TARGETS):
        print "============================================="
        print "Target %d/%d"%(tidx+1,len(TARGETS))
        for n in range(NMIN,NMAX+1):
            log(" => n=%s "%n)
            successful = False
            tries = 0
            factor = 0.1
            while not successful:
                x0 = numpy.random.random(3*n) * factor
                x = scipy.optimize.fmin(f,x0, [t,n], xtol=MAX_ERR, ftol=MAX_ERR )
                diffs = fsq(x,t,n)
                successful = (numpy.abs(diffs)<MAX_ERR).all()
                if successful:
                    log(" OK, error=[%s,%s,%s]\n"%(diffs[0],diffs[1],diffs[2]))
                    print " SOLUTION FOUND "
                    print x
                else:
                    tries += 1
                    log(" FAILED, tries=%d\n"%tries)
                    print diffs
                    factor += 0.1
                    if tries>5:
                        print "!!!!!!!!!!!! GIVING UP !!!!!!!!!!!"
                        break
if __name__ == "__main__":
    main()

推荐答案

您要尝试做的事情(如果我理解您的设置)被称为整数编程,这很困难. http://en.wikipedia.org/wiki/Integer_programming .我意识到您不是在寻找整数解,但是如果将所有输入乘以10,然后将目标函数除以100,则会得到一个等效的问题,其中输入都是整数.关键是,您的输入是离散的.

What you are attempting to do (if I understood your setup) is called integer programming and it is NP-hard; http://en.wikipedia.org/wiki/Integer_programming. I realize that you aren't looking for integer solutions, but if you multiply all your inputs by 10 and divide your target function by 100, you obtain an equivalent problem where the inputs are all integers. The point is, your inputs are discrete.

您要使用的目标函数是一个凸二次函数,并且有良好的约束优化算法可以对区间[0,10]中的实值输入快速求解.由此,您可以尝试四舍五入或检查附近所有可接受的点,但是其中有2 ^ n个,其中n是输入数.即使您这样做,也不能保证最佳解决方案是这些要点之一.

The target function you are working with is a convex, quadratic function, and there are good constrained optimization algorithms that will solve it quickly for real-valued inputs in the interval [0, 10]. From this you can try rounding or checking all acceptable points nearby, but there are 2^n of them, where n is the number of inputs. Even if you do this, the optimal solution is not guaranteed to be one of these points.

对于整数编程问题,有一些近似算法,您可能会发现有时其中的一种算法可以很好地使您达到最佳点.在我引用的Wikipedia文章中,您可以尝试一些方法,但是我不知道您会为解决这个问题而高兴.

There are approximation algorithms for integer programming problems and you may find that sometimes one of them works well enough to get you to an optimal point. There are a list of things you might try in the Wikipedia article I cited, but I don't know that you will be happy trying to solve this problem.

这篇关于如何用scipy中的离散变量值最小化一个函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆