算法计算的函数/蒙特卡罗方法的合理性 [英] Algorithm for computing the plausibility of a function / Monte Carlo Method

查看:111
本文介绍了算法计算的函数/蒙特卡罗方法的合理性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写一个程序,试图复制在本文开头所讨论的算法,

<一个href="http://www-stat.stanford.edu/~cgates/PERSI/papers/MCMCRev.pdf">http://www-stat.stanford.edu/~cgates/PERSI/papers/MCMCRev.pdf

F是CHAR函数为char。假设PL(f)为该函数的一个似然性的措施。该算法是:

以preliminary猜测的功能出发,说楼和再新的函数f * -

  • 在计算PL(F)。
  • 更改为F *通过随机换位的数值F的分配给两个符号。
  • 在计算PL(F *);如果这是比PL(F)时,接受F *。
  • 如果不是,翻转PL(F)/ PL(F *)的硬币;如果出现正面,接受F *。
  • 如果掷硬币过来尾巴,留在f。

我使用下面的code实现这一点。我使用C#,但试图使它较为简单的给大家。如果有一个更好的论坛,这个请让我知道。

  VAR current_f =初始(); //目前接受的函数f
 变种current_Pl_f = InitialPl(); //接受函数f目前的合理性

 的for(int i = 0; I&LT; 10000;我++)
        {
            VAR candidate_f =移调(current_f); //创建一个候选函数

            VAR candidate_Pl_f = ComputePl(candidate_f); //计算其合理性

            如果(candidate_Pl_f&GT; current_Pl_f)//候选PL有所改善
            {
                current_f = candidate_f; //接受候选人
                current_Pl_f = candidate_Pl_f;
            }
            否则//否则抛硬币
            {
                INT翻转=翻转();

                如果(翻转== 1)//头
                {
                    current_f = candidate_f; //接受也无妨
                    current_Pl_f = candidate_Pl_f;
                }
                否则,如果(翻转== 0)//尾巴
                {
                    //这里做什么?
                }
            }
        }
 

我的问题是,基本上这是否看起来像执行该算法的最佳方法。好像我可能会陷入一些局部极大/局部极小,尽管实现这种方法。

修改 - 这是本质上什么背后移调()方法。我用的类型&lt的字典/哈希表;&LT;炭,炭>>候选功能(S)用于查找任何给定的字符 - >字符转换。所以转置方法简单地交换该决定了函数的行为的字典两个值。

 专用字典&LT;焦炭,焦炭&GT;移调(字典&LT;焦炭,焦炭&GT;地图,PARAMS INT []指数)
    {
        的foreach(在指数VAR指数)
        {
            焦炭target_val = map.ElementAt(指数).value的; //获取索引处的值

            焦炭target_key = map.ElementAt(指数)。重点; //获取索引中的密钥

            INT _rand = _random.Next(map.Count); //得到一个随机密钥(炭)换用

            焦炭rand_key = map.ElementAt(_rand)。重点;

            焦炭source_val =映射[rand_key] //目前所使用的交换源的值

            图[target_key] = source_val; //做交换

            图[rand_key] = target_val;
        }

        返回地图;
    }
 

请记住,使用底层的词典中的候选函数基本上是:

 公共字符的getchar(CHAR中,字典和LT;焦炭,焦炭&GT; theMap)
{
     返回theMap [字符]
}
 

这是计算PL(六)功能:

 公共十进制ComputePl(Func键&LT;焦炭,焦炭→候选,字符串加密,小数[] [] _​​Matrix应留作)
    {
        十进制产品=默认值(十进制);

        的for(int i = 0; I&LT; encrypted.Length;我++)
        {
            INT J = I + 1;

            如果(J&GT; = encrypted.Length)
            {
                打破;
            }

            所以char a =候选人(加密[I]);
            炭B =候选人(加密[J]);

            INT _a = GetIndex(_alphabet,一); // _alphabet是所有AVL字符的只是一个字符串/的char []
            INT _b = GetIndex(_alphabet,B);

            十进制_freq = _Matrix应留作[_a] [_ B];

            如果(产品==默认值(十进制))
            {
                产品= _freq;
            }
            其他
            {
                产品=产品* _freq;
            }
        }

        返回产品;
    }
 

解决方案

姑且, codereview.stackexchange.com 可能是一个的为了这个美好的论坛的。
从来没有少,我将快速地刺在它:

  • 在一目了然,显示的片段是一个的正确的实施算法。
  • 是否该算法将局部极小卡住,在技术方案的算法的一个问题,不给的执行的。 (参见下文)
  • 您寻求一个的最佳做法的似乎在调整将被引导在算法的(偏离了原有的算法),而不是在执行调整(以使其更快和/或消除一些可能的缺陷)。有关算法的事项,请参阅下面的讨论;了解有关实施讨论考虑以下几点:
    • 确保翻转()方法是公平
    • 确保ComputePl()是正确的:经常出现,因为与算术precision的价值函数问题的一些错误
    • 在保证公平(等概率)与变调()方法。
    • 性能改进将可能来自在ComputePl优化()方法(未示出),而不是在主循环。

有关算法讨论的每本身的及其是否适用于不同的问题。
简单地说,算法是引导随机搜索,从而[巨大]解空间进行采样两个随机设备:(非常轻微每次)变调()方法,它改变了当前候选功能和翻转()方法,该方法决定无论是[本地]次优的解决方案应该生存下去。搜索是通过一个目标函数,ComputePl(),它是本身的基础上在某些参照语料库一级转变的矩阵指导。版

在这种情况下,局部最小值焦油坑,可避免通过增加选择次优功能的概率:,而不是一个公平50-50翻转(),可能尝试与保留了66%的次优溶液或一probablity甚至75%。这种方法通常延伸必要会聚朝向最优解代的数目,但是,作为所述可避免陷入局部最小值。

在保证算法的适用性的另一种方法是,确保给定的功能的可信度进行更好的评估。可能的解释为算法的相对成功和通用的

  • 在第一级转变的英语语言的分布不是很均匀(从而机型相当好一个给定功能的合理性,通过奖励它的匹配,并惩罚它的不匹配)。这种多维度的统计,具有更好的弹性,以背离参照语料库不如说是人物的AA给定的语言/文集命令0的分配。
  • 第一级转变而被特定语言的分布,可以在不同的语言和/或在行话词汇的上下文中大体类似于[基于所述语言。事情的速记行话的情况下分解,从而字母,如vowals通常被省略。

因此​​,为了提高算法给定问题的适用性,确保使用的分配矩阵作为紧密匹配尽可能的语言和底层文本的域

I am writing a program that attempts to duplicate the algorithm discussed at the beginning of this article,

http://www-stat.stanford.edu/~cgates/PERSI/papers/MCMCRev.pdf

F is a function from char to char. Assume that Pl(f) is a 'plausibility' measure of that function. The algorithm is:

Starting with a preliminary guess at the function, say f, and a then new function f* --

  • Compute Pl(f).
  • Change to f* by making a random transposition of the values f assigns to two symbols.
  • Compute Pl(f*); if this is larger than Pl(f), accept f*.
  • If not, flip a Pl(f)/Pl(f*) coin; if it comes up heads, accept f*.
  • If the coin toss comes up tails, stay at f.

I am implementing this using the following code. I'm using c# but tried to make it somewhat more simplified for everyone. If there is a better forum for this please let me know.

 var current_f = Initial();    // current accepted function f
 var current_Pl_f = InitialPl();  // current plausibility of accepted function f

 for (int i = 0; i < 10000; i++)
        {
            var candidate_f = Transpose(current_f); // create a candidate function

            var candidate_Pl_f = ComputePl(candidate_f);  // compute its plausibility

            if (candidate_Pl_f > current_Pl_f)            // candidate Pl has improved
            {
                current_f = candidate_f;            // accept the candidate
                current_Pl_f = candidate_Pl_f; 
            }
            else                                    // otherwise flip a coin
            {
                int flip = Flip(); 

                if (flip == 1)                      // heads
                {
                    current_f = candidate_f;        // accept it anyway
                    current_Pl_f = candidate_Pl_f; 
                }
                else if (flip == 0)                 // tails
                {
                    // what to do here ?
                }
            }
        }

My question is basically whether this looks like the optimal approach to implementing that algorithm. IT seems like I may be getting stuck in some local maxima / local minima despite implementing this method.

EDIT - Here is the essentially whats behind Transpose() method. I use a dictionary / hash table of type << char, char >> that the candidate function(s) use to look any given char -> char transformation. So the transpose method simply swaps two values in the dictionary that dictates the behavior of the function.

    private Dictionary<char, char> Transpose(Dictionary<char, char> map, params int[] indices)
    {
        foreach (var index in indices)
        {
            char target_val = map.ElementAt(index).Value;   // get the value at the index

            char target_key = map.ElementAt(index).Key;     // get the key at the index

            int _rand = _random.Next(map.Count);   // get a random key (char) to swap with

            char rand_key = map.ElementAt(_rand).Key;

            char source_val = map[rand_key]; // the value that currently is used by the source of the swap

            map[target_key] = source_val; // make the swap

            map[rand_key] = target_val;
        }

        return map; 
    }

Keep in mind that a candidate function that uses the underlying dictionary is basically just:

public char GetChar(char in, Dictionary<char, char> theMap)
{
     return theMap[char]; 
}

And this is the function that computes Pl(f):

    public decimal ComputePl(Func<char, char> candidate, string encrypted, decimal[][] _matrix)
    {
        decimal product = default(decimal);

        for (int i = 0; i < encrypted.Length; i++)
        {
            int j = i + 1;

            if (j >= encrypted.Length)
            {
                break;
            }

            char a = candidate(encrypted[i]);
            char b = candidate(encrypted[j]);

            int _a = GetIndex(_alphabet, a); // _alphabet is just a string/char[] of all avl chars 
            int _b = GetIndex(_alphabet, b);

            decimal _freq = _matrix[_a][_b]; 

            if (product == default(decimal))
            {
                product = _freq;
            }
            else
            {
                product = product * _freq;
            }
        }

        return product;
    }

解决方案

Tentatively, codereview.stackexchange.com may be a "better forum for this".
Never the less, I'll take a quick stab at it:

  • At a glance, the snippet shown is a correct implementation of the algorithm.
  • Whether the algorithm will "get stuck" in local minima is a issue pertaining to the algorithm not to the implementation. (see discussion below)
  • Your quest for an "optimal approach" seems to be directed at tweaks in the algorithm (deviation from the original algorithm) rather than at tweaks in the implementation (to make it faster and/or to eradicate some possible flaws). For considerations regarding the algorithm, see discussion below; for discussion regarding the implementation consider the following:
    • ensure the Flip() method is fair
    • ensure the ComputePl() is correct: some errors often arise because of issues with the arithmetic precision in the value function.
    • ensure fairness (equiprobability) with the Transpose() method.
    • Performance improvements would likely come from optimizations in the ComputePl() method (not shown) rather than in the main loop.

Discussion regarding the algorithm per-se and its applicability to different problems.
In a nutshell the algorithm is a guided stochastic search, whereby the [huge] solution space is sampled with two random devices: the Transpose() method which modifies (very slightly each time) the current candidate function and the Flip() method which decides whether a [locally] suboptimal solution should survive. The search is guided by an objective function, ComputePl() which is itself based on a matrix of first order transition in some reference corpus.

In this context, local minima "tar pits" can be avoided by augmenting the probability of selecting "suboptimal" functions: rather than a fair 50-50 Flip(), maybe try with a probablity of retaining "suboptimal" solutions of 66% or even 75%. This approach typically extend the number of generations necessary to converge toward the optimal solution, but, as said may avoid getting stuck in local minima.

Another way of ensuring the applicability of the algorithm is to ensure a better evaluation of the plausibility of given functions. Likely explanations for the relative success and generality of the algorithm are

  • The distribution of first order transitions in the English language is not very uniform (and hence models rather well the plausibility of a given function, by rewarding it for its matches and by punishing it for its mismatches). This kind of multi-dimensional statistic, is more resilient to departure from the reference corpus than say the "order 0" distribution of characters in a a given language/corpus.
  • The distribution of first order transitions while being language-specific, are generally similar across different languages and/or in the context of jargon vocabularies [based on said languages]. Things do break down in the case of short-hand jargons whereby letters such as vowals are typically omitted.

Therefore to improve the applicability of the algorithm to a given problem, ensure that the distribution matrix used matches as closely as possible the language and domain of the underlying text.

这篇关于算法计算的函数/蒙特卡罗方法的合理性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆