如何优化Langton的蚂蚁sim? [英] How to optimise this Langton's ant sim?

查看:77
本文介绍了如何优化Langton的蚂蚁sim?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写Langton的ant sim(用于规则字符串RLR),并试图对其进行优化以提高速度.这是目前的相关代码:

I'm writing a Langton's ant sim (for rulestring RLR) and am trying to optimise it for speed. Here's the pertinent code as it stands:

#define AREA_X 65536
#define AREA_Y 65536
#define TURN_LEFT 3
#define TURN_RIGHT 1
int main()
{
  uint_fast8_t* state;
  uint_fast64_t ant=((AREA_Y/2)*AREA_X) + (AREA_X/2);
  uint_fast8_t ant_orientation=0;
  uint_fast8_t two_pow_five=32;
  uint32_t two_pow_thirty_two=0;/*not fast, relying on exact width for overflow*/
  uint_fast8_t change_orientation[4]={0, TURN_RIGHT, TURN_LEFT, TURN_RIGHT};
  int_fast64_t* move_ant={AREA_X, 1, -AREA_X, -1};
  ... initialise empty state
  while(1)
  {
    while(two_pow_five--)/*removing this by doing 32 steps per inner loop, ~16% longer*/
    {
      while(--two_pow_thirty_two)
      {
        /*one iteration*/
        /* 54 seconds for init + 2^32 steps
        ant_orientation = ( ant_orientation + (117>>((++state[ant])*2 )) )&3;
        state[ant] = (36 >> (state[ant] *2) ) & 3;
        ant+=move_ant[ant_orientation];
        */

        /* 47 seconds for init + 2^32 steps
        ant_orientation = ( ant_orientation + ((state[ant])==1?3:1) )&3;
        state[ant] += (state[ant]==2)?-2:1;
        ant+=move_ant[ant_orientation];
        */

        /* 46 seconds for init + 2^32 steps
        ant_orientation = ( ant_orientation + ((state[ant])==1?3:1) )&3;
        if(state[ant]==2)
        {
          --state[ant];
          --state[ant];
        }
        else
          ++state[ant];
        ant+=move_ant[ant_orientation];
        */

        /* 44 seconds for init + 2^32 steps
        ant_orientation = ( ant_orientation + ((++state[ant])==2?3:1) )&3;
        if(state[ant]==3)state[ant]=0;
        ant+=move_ant[ant_orientation];
        */

        // 37 seconds for init + 2^32 steps
        // handle every situation with nested switches and constants
        switch(ant_orientation)
        {
          case 0:
            switch(state[ant])
            {
              case 0:
                ant_orientation=1;
                state[ant]=1;
                ++ant;
                break;
              case 1:
                ant_orientation=3;
                state[ant]=2;
                --ant;
                break;
              case 2:
                ant_orientation=1;
                state[ant]=0;
                ++ant;
                break;
            }
            break;
          case 1:
            switch(state[ant])
            {
              ...
            }
            break;
          case 2:
            switch(state[ant])
            {
              ...
            }
            break;
          case 3:
            switch(state[ant])
            {
              ...
            }
            break;
        }

      }
    }
    two_pow_five=32;
    ... dump to file every 2^37 steps
  }
  return 0;
}

我有两个问题:

  1. 我已经通过试错测试试图尽可能地优化c,是否有我没有利用的技巧?尽管我可能会在某些时候尝试使用汇编语言,但请尝试使用c而不是汇编语言.

  1. I've tried to optimise as best as I can with c by trial and error testing, are there any tricks I haven't taken advantage of? Please try to talk in c not assembly, although I'll probably try assembly at some point.

是否有更好的方法对问题进行建模以提高速度?

Is there a better way to model the problem to increase speed?

更多信息:可移植性无关紧要.我在64位Linux上,使用gcc,i5-2500k和16 GB的ram.原来的状态数组使用4GiB,程序可以可行地使用ram的12GiB. sizeof(uint_fast8_t)= 1.故意没有边界检查,很容易从转储中手动发现损坏.

More info: Portability doesn't matter. I'm on 64 bit linux, using gcc, an i5-2500k and 16 GB of ram. The state array as it stands uses 4GiB, the program could feasibly use 12GiB of ram. sizeof(uint_fast8_t)=1. Bounds checks are intentionally not present, corruption is easy to spot manually from the dumps.

也许是出于反感,在switch语句上进行堆放而不是消除它们已经产生了迄今为止最好的效率.

edit: Perhaps counter-inuitively, piling on the switch statements instead of eliminating them has yielded the best efficiency so far.

编辑:我对问题进行了重新建模,并提出了比每次迭代仅一步之遥的方法.以前,每个状态元素使用两位,并在Langton的蚂蚁网格中描述了一个单元.新方法使用所有8位,并描述了网格的2x2部分.每次迭代都通过查找预先计算的步数,当前状态+方向的新方向和新状态的值来完成可变数量的步骤.假设一切均等,则平均每次迭代要执行2个步骤.作为奖励,它使用内存的1/4来模拟同一区域:

edit: I've re-modelled the problem and come up with something quicker than a single step per iteration. Before, each state element used two bits and described a single cell in the Langton's ant grid. The new way uses all 8 bits, and describes a 2x2 section of the grid. Every iteration a variable number of steps are done, by looking up pre-computed values of step count, new orientation and new state for the current state+orientation. Assuming everything is equally likely it averages to 2 steps taken per iteration. As a bonus it uses 1/4 of the memory to model the same area:

while(--iteration)
{
        // roughly 31 seconds per 2^32 steps
        table_offset=(state[ant]*24)+(ant_orientation*3);
        it+=twoxtwo_table[table_offset+0];
        state[ant]=twoxtwo_table[table_offset+2];
        ant+=move_ant2x2[(ant_orientation=twoxtwo_table[table_offset+1])];
}

还没有尝试对其进行优化,接下来要尝试的是消除偏移等式和具有嵌套开关和常量的查找,就像以前一样(但使用648个内部案例而不是12个内部案例).

Haven't tried optimising it yet, the next thing to try is eliminating the offset equation and lookups with nested switches and constants like before (but with 648 inner cases instead of 12).

推荐答案

或者,您可以使用单个无符号字节常量作为人工寄存器,而不是分支:

Or, you can use a single unsigned byte constant as an artificial register instead of branching:

value:   1  3  1  1
bits:   01 11 01 01 ---->101 decimal value for an unsigned byte
index    3  2  1  0  ---> get first 2 bits to get  "1"  (no shift)
                      --> get second 2 bits to get "1"  (shifting for 2 times)
                      --> get third 2 bits to get  "3"  (shifting for 4 times)
                      --> get last 2 bits to get   "1"  (shifting for 6 times)

 Then "AND" the result with  binary(11) or decimal(3) to get your value.

 (101>>( (++state[ant])*2  ) ) & 3 would give you the turnright or turnleft

 Example:
 ++state[ant]= 0:  ( 101>>( (0)*2 ) )&3  --> 101 & 3 = 1
 ++state[ant]= 1:  ( 101>>( (1)*2 ) )&3  --> 101>>2 & 3 = 1
 ++state[ant]= 2:  ( 101>>( (2)*2 ) )&3  --> 101>>4 & 3 = 3  -->turn left
 ++state[ant]= 3:  ( 101>>( (3)*2 ) )&3  --> 101>>6 & 3 = 1

 Maximum six-shifting + one-multiplication + one-"and" may be better.
 Dont forget constant can be auto-promoted so you may add some suffixes or something else.

由于对%4模数使用无符号整数",因此可以使用与"运算.

  state[ant]=state[ant]&3; instead of state[ant]=state[ant]%4;

对于不熟练的编译器,这应该提高速度.

For unskilled compilers, this should increase speed.

最困难的部分:模3

  C = A % B is equivalent to C = A – B * (A / B)
  We need  state[ant]%3
  Result = state[ant] - 3 * (state[ant]/3)

  state[ant]/3 is always <=1 for your valid direction states.
  Only when state[ant]  is 3 then state[ant]/3 is 1, other values give 0.
  When multiplied by 3, that part is 0 or 3 (only 3 when state[ant] is 3 otherwise 0)
  Result = state[ant] - (0 or 3)

  Lets look at all possibilities:

  state[ant]=0:  0 - 0  ---> 0   ----> 00100100 shifted by 0 times &3 --> 00000000
  state[ant]=1:  1 - 0  ---> 1   ----> 00100100 shifted by 2 times &3 --> 00000001
  state[ant]=2:  2 - 0  ---> 2   ----> 00100100 shifted by 4 times &3 --> 00000010
  state[ant]=3:  3 - 3  ---> 0   ----> 00100100 shifted by 6 times &3 --> 00000000

  00100100 is 36 in decimal.

  (36 >> (state[ant] *2) ) & 3 will give you state[ant]%3 for your valid states (0,1,2,3)

  Example: 

  state[ant]=0: 36 >> 0  --> 36 ----> 36& 3 ----> 0  satisfies 0%3
  state[ant]=1: 36 >> 2  --> 9 -----> 9 & 3 ----> 1  satisfies 1%3
  state[ant]=2: 36 >> 4  --> 2 -----> 2 & 3 ----> 2 satisfies  2%3
  state[ant]=3: 36 >> 6  --> 0 -----> 0 & 3 ----> 0 satisfies 3%3

这篇关于如何优化Langton的蚂蚁sim?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆