C++ 中的快速百分位 [英] Fast percentile in C++

查看:180
本文介绍了C++ 中的快速百分位的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的程序为风险价值指标计算蒙特卡罗模拟.为了尽可能简化,我有:

My program calculates a Monte Carlo simulation for the value-at-risk metric. To simplify as much as possible, I have:

1/ simulated daily cashflows
2/ to get a sample of a possible 1-year cashflow, 
   I need to draw 365 random daily cashflows and sum them

因此,每日现金流量是一个经验给定的分配函数,需要进行 365 次采样.为此,我

Hence, the daily cashflows are an empirically given distrobution function to be sampled 365 times. For this, I

 1/ sort the daily cashflows into an array called *this->distro*
 2/ calculate 365 percentiles corresponding to random probabilities

我需要对年度现金流进行模拟,例如 10K 次,才能获得大量模拟年度现金流.准备好每日现金流量的分布函数,我做的抽样就像...

I need to do this simulation of a yearly cashflow, say, 10K times to get a population of simulated yearly cashflows to work with. Having the distribution function of daily cashflows prepared, I do the sampling like...

for ( unsigned int idxSim = 0; idxSim < _g.xSimulationCount; idxSim++ )
{
    generatedVal = 0.0;
    for ( register unsigned int idxDay = 0; idxDay < 365; idxDay ++ )
    {
        prob = (FLT_TYPE)fastrand();         // prob [0,1]
        dIdx = prob * dMaxDistroIndex;       // scale prob to distro function size
                                             // to get an index into distro array
        _floor = ((FLT_TYPE)(long)dIdx);     // fast version of floor
        _ceil  = _floor + 1.0f;              // 'fast' ceil:)
        iIdx1  = (unsigned int)( _floor );
        iIdx2  = iIdx1 + 1;

        // interpolation per se
        generatedVal += this->distro[iIdx1]*(_ceil - dIdx  );
        generatedVal += this->distro[iIdx2]*(dIdx  - _floor);
    }
    this->yearlyCashflows[idxSim] = generatedVal ;
}

for 循环内的代码进行线性插值.如果说 1000 美元对应于 prob=0.01,10000 美元对应于 prob=0.1,那么如果我没有 p=0.05 的经验数,我想通过插值获得 5000 美元.

The code inside of both for cycles does linear interpolation. If, say USD 1000 corresponds to prob=0.01, USD 10000 corresponds to prob=0.1 then if I don't have an empipirical number for p=0.05 I want to get USD 5000 by interpolation.

问题:这段代码运行正确,尽管分析器说程序在插值本身上花费了大约 60% 的运行时间.所以我的问题是,我怎样才能更快地完成这项任务?VTune 针对特定线路报告的示例运行时间如下:

The question: this code runs correctly, though the profiler says that the program spends cca 60% of its runtime on the interpolation per se. So my question is, how can I make this task faster? Sample runtimes reported by VTune for specific lines are as follows:

prob = (FLT_TYPE)fastrand();         //  0.727s
dIdx = prob * dMaxDistroIndex;       //  1.435s
_floor = ((FLT_TYPE)(long)dIdx);     //  0.718s
_ceil  = _floor + 1.0f;              //    -

iIdx1  = (unsigned int)( _floor );   // 4.949s
iIdx2  = iIdx1 + 1;                  //    -

// interpolation per se
generatedVal += this->distro[iIdx1]*(_ceil - dIdx  );  //    -
generatedVal += this->distro[iIdx2]*(dIdx  - _floor);  // 12.704s

破折号表示分析器不报告这些行的运行时.

Dashes mean the profiler reports no runtimes for those lines.

任何提示将不胜感激.丹尼尔

Any hint will be greatly appreciated. Daniel

c.fogelklou 和 MSalters 都指出了很大的改进.符合 c.fogelklou 所说的最好的代码是

Both c.fogelklou and MSalters have pointed out great enhancements. The best code in line with what c.fogelklou said is

converter = distroDimension / (FLT_TYPE)(RAND_MAX + 1)
for ( unsigned int idxSim = 0; idxSim < _g.xSimulationCount; idxSim++ )
{
    generatedVal = 0.0;
    for ( register unsigned int idxDay = 0; idxDay < 365; idxDay ++ )
    {
        dIdx  = (FLT_TYPE)fastrand() * converter;
        iIdx1 = (unsigned long)dIdx);
        _floor = (FLT_TYPE)iIdx1;
        generatedVal += this->distro[iIdx1] + this->diffs[iIdx1] *(dIdx  - _floor);
    }
}

我所拥有的最好的 MSalter 产品线是

and the best I have along MSalter's lines is

normalizer = 1.0/(FLT_TYPE)(RAND_MAX + 1);
for ( unsigned int idxSim = 0; idxSim < _g.xSimulationCount; idxSim++ )
{
    generatedVal = 0.0;
    for ( register unsigned int idxDay = 0; idxDay < 365; idxDay ++ )
    {
        dIdx  = (FLT_TYPE)fastrand()* normalizer ;
        iIdx1 = fastrand() % _g.xDayCount;
        generatedVal += this->distro[iIdx1];
        generatedVal += this->diffs[iIdx1]*dIdx;
    }
}

第二个代码大约是.快 30%.现在,在 95 秒的总运行时间中,最后一行消耗了 68 秒.最后一行只消耗 3.2 秒,因此双倍乘法一定是魔鬼.我想到了 SSE - 将最后三个操作数保存到一个数组中,然后执行 this->diffs[i]*dIdx[i] 的向量乘法并将其添加到 this->distro[i] 但这段代码运行了 50%慢点.因此,我想我撞墙了.

The second code is approx. 30 percent faster. Now, of 95s of total runtime, the last line consumes 68s. The last but one line consumes only 3.2s hence the double*double multiplication must be the devil. I thought of SSE - saving the last three operands into an array and then carry out a vector multiplication of this->diffs[i]*dIdx[i] and add this to this->distro[i] but this code ran 50 percent slower. Hence, I think I hit the wall.

非常感谢大家.D.

推荐答案

这是一个小优化的提议,消除了对 ceil、两次强制转换和一次乘法的需要.如果您在定点处理器上运行,这将解释为什么 float 和 int 之间的乘法和转换需要这么长时间.在这种情况下,如果 CPU 支持,请尝试使用定点优化或在编译器中打开浮点!

This is a proposal for a small optimization, removing the need for ceil, two casts, and one of the multiplies. If you are running on a fixed point processor, that would explain why the multiplies and casts between float and int are taking so long. In that case, try using fixed point optimizations or turning on floating point in your compiler if the CPU supports it!

for ( unsigned int idxSim = 0; idxSim < _g.xSimulationCount; idxSim++ )
{
    generatedVal = 0.0;
    for ( register unsigned int idxDay = 0; idxDay < 365; idxDay ++ )
    {
        prob = (FLT_TYPE)fastrand();         // prob [0,1]
        dIdx = prob * dMaxDistroIndex;       // scale prob to distro function size
                                             // to get an index into distro array
        iIdx1  = (long)dIdx;
        _floor = (FLT_TYPE)iIdx1;     // fast version of floor
        iIdx2  = iIdx1 + 1;

        // interpolation per se
        {
           const FLT_TYPE diff = this->distro[iIdx2] - this->distro[iIdx1];
           const FLT_TYPE interp = this->distro[iIdx1] + diff * (dIdx - _floor);
           generatedVal += interp;
        }
    }
    this->yearlyCashflows[idxSim] = generatedVal ;
}

这篇关于C++ 中的快速百分位的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆