xorshift128 +的AVX/SSE版本 [英] AVX/SSE version of xorshift128+

查看:152
本文介绍了xorshift128 +的AVX/SSE版本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试制作最快的高质量RNG.阅读 http://xorshift.di.unimi.it/之后,xorshift128 +似乎是一个不错的选择. C代码是

I am trying to make the fastest possible high quality RNG. Having read http://xorshift.di.unimi.it/ , xorshift128+ seems like a good option. The C code is

#include <stdint.h>
uint64_t s[ 2 ];

uint64_t next(void) { 
    uint64_t s1 = s[ 0 ];
    const uint64_t s0 = s[ 1 ];
    s[ 0 ] = s0;
    s1 ^= s1 << 23; // a
    return ( s[ 1 ] = ( s1 ^ s0 ^ ( s1 >> 17 ) ^ ( s0 >> 26 ) ) ) + s0; // b, c
}

可悲的是,我不是SSE/AVX专家,但是我的CPU支持SSE4.1/SSE4.2/AVX/F16C/FMA3/XOP指令.您如何使用它们来加快代码的速度(假设您想生成数十亿个这样的随机数)?实际上,这种加速的预期极限是什么?

I am not an SSE/AVX expert sadly but my CPU supports SSE4.1 / SSE4.2 / AVX / F16C / FMA3 / XOP instructions. How could you use these to speed up this code (assuming you want to make billions of such random numbers) and what is the expected limit to this speedup in practice?

推荐答案

XorShift确实是一个不错的选择.它是如此之好,如此之快,所需要的状态很少,以至于我惊讶地看到它被如此之少的使用.它应该是所有平台上的标准生成器.我8年前就已经实现了它,即使那样它也可以生成800MB/s的随机字节.

XorShift is indeed a good choice. It is so good, so fast and requires so little state that I'm surprised to see so little adoption. It should be the standard generator on all platforms. I have implemented it myself 8 years ago and even then it could generate 800MB/s of random bytes.

您不能使用矢量指令来加快生成单个随机数的速度.在那几条指令中,指令级并行性太少了.

You cannot use vector instructions to speed up generating a single random number. There is too little instruction-level parallelism in those few instructions.

但是您可以轻松地加快生成N个数字的速度,其中N是目标指令集的向量大小.只需并行运行N个发电机即可.保持N个生成器的状态并同时生成N个数字.

But you can easily speed up generating N numbers where N is the vector size of your target instruction set. Just run N generators in parallel. Keep state for N generators and generate N numbers at the same time.

如果客户代码一次要求输入一个数字,则可以保留N个(或更多)数字的缓冲区.如果缓冲区为空,则使用矢量指令将其填充.如果缓冲区不为空,则只需返回下一个数字即可.

If client code demands numbers one at a time you could keep a buffer of N (or more) numbers. If the buffer is empty you fill it using vector instructions. If the buffer is not empty you just return the next number.

这篇关于xorshift128 +的AVX/SSE版本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆