RDRAND在常春藤桥上的疲惫特征是什么? [英] What are the exhaustion characteristics of RDRAND on Ivy Bridge?

查看:93
本文介绍了RDRAND在常春藤桥上的疲惫特征是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

查看RDRAND调用耗尽了这些128位缓冲区太快而导致不可用随机数据吗?还是不可用,是否意味着DRNG无法通过运行状况检查并且无法生成任何新数据?基本上,我试图了解是否可能由于调用RDRAND时缓冲区(临时)为空而发生CF = 0.

注意:我已经对解决方案

第1部分. 拉16、32或64位有什么区别吗?

否.

在Ivy Bridge上,无论目标寄存器的大小如何,CPU内核都会通过内部通信链接将64位拉至DRNG.因此,如果您读取32位,则它将拉取64位并丢弃上半部分.如果您读取16位,则它会拉64并扔掉顶部3/4.

说明文档中未对此进行描述,因为在将来的产品中可能不会继续适用.可以设计一种芯片,该芯片可以存储和使用64位字的未使用部分.但是,今天并不需要显着的性能.

对于最高吞吐量,最有效的策略是从并行线程中拉出.这是因为芯片上的总线层次结构存在并行性.该指令的大部分时间是公交车的穿越时间.并行执行该传输将使吞吐量随线程数线性增加,最高可达到800MBytes/s.第二件事是使用64位RdRand,因为它们每条指令可获取更多数据.

第2部分. CF = 0到底是什么意思?

这意味着随机数据不可用".这是因为如果不关闭数字并无法读取更多的寄存器,CPU内核将无法获取为何无法获取数字的详细信息,这是不可行的,因为它与信息无关./p>

如果将DRNG的输出缓冲区吸干,则会产生下溢(CF = 0),但由于DRNG速度很快,因此可以期望下一个RdRand成功.

如果DRNG失败(例如,晶体管在熵源中弹出,并且不再是随机的),则在线运行状况测试将检测到此情况并关闭DRNG.然后,您所有的RdRand调用都将产生CF = 0.

但是,在Ivy Bridge上,您将无法使缓冲区下溢. DRNG比它所连接的总线快一点.每单位时间(使用并行线程)提取更多数据的效果将增加每个单独RdRand的执行时间,因为总线上的争用导致指令必须在DRNG的本地总线上排队.您永远不能拉得太快,以免DRNG下溢.您将逐渐达到800 MBytes/s.

文档中也没有对此进行描述,因为在将来的产品中可能不会继续如此.我们可以设想产品中的总线速度更快,核心速度更快,并且DRNG可能会被下溢.这些东西尚不为人所知,因此我们无法对其进行声明.

真正的是,软件实施者指南中给出的基本循环(最多尝试10次,然后向上报告故障)将继续在将来的产品中使用,因为我们声称它将会,因此我们将设计所有将来的产品来实现这一目标.

所以不,CF = 0不会发生,因为在Ivy Bridge上当调用RDRAND时,缓冲区碰巧是(暂时)为空的",但是它可能会在将来的芯片上出现,所以设计您的软件来应对.

After reviewing the Intel Digital Random Number Generator (DRNG) Software Implementation Guide, I have a few questions about what happens to the internal state of the generator when RDRAND is invoked. Unfortunately the answers don't seem to be in the guide.

  1. According to the guide, inside the DRNG there are four 128-bit buffers that serve random bits for RDRAND to drain. RDRAND itself will provide either 16, 32, or 64 bits of random data depending on the width of the destination register:

    rdrand ax   ; put 16 random bits in ax
    rdrand eax  ; put 32 random bits in eax
    rdrand rax  ; put 64 random bits in rax
    

    Will the use of larger destination registers empty those 128-bit buffers more quickly? For example, if I need only 2 bits of randomness, should I go through the trouble of using a 16 bit register over a 64 bit register? Will that make any difference on the throughput of the DRNG? I'd like to avoid consuming more randomness than is necessary.

  2. The guide says the carry flag will be set after RDRAND executes:

    CF = 1   Destination register valid. Non-zero random value
             available at time of execution. Result placed in register.
    CF = 0   Destination register all zeros. Random value not available
             at time of execution. May be retried.
    

    What does "not available" mean? Can random data be unavailable because RDRAND invocations exhausted those 128-bit buffers too quickly? Or does unavailable mean the DRNG is failing its health checks and cannot generate any new data? Basically, I'm trying to understand if CF=0 can occur just because the buffers happen to be (transiently) empty when RDRAND is invoked.

Note: I have reviewed the answers to this question on throughput and latency of RDRAND, but I'm seeking different information.

Thanks!

解决方案

Part 1. Does it make a difference pulling 16, 32 or 64 bits?

No.

On Ivy Bridge, the CPU cores pull 64 bits over the internal communication links to the DRNG, regardless of the size of the destination register. So if you read 32 bits, it pulls 64 bits and throws away the top half. If you read 16 bits, it pulls 64 and throws away the top 3/4.

This is not described in the instruction documentation because it may not continue to be true in future products. A chip might be designed which stashes and uses the unused parts of the 64 bit word. However there isn't a significant performance imperative to do this today.

For the highest throughput, the most effective strategy is to pull from parallel threads. This is because there is parallelism in the bus hierarchy on chip. Most of the time for the instruction is transit time across the buses. Performing that transit in parallel is going to yield a linear increase in throughput with the number of threads, up to the maximum of 800MBytes/s. The second thing is to use 64-bit RdRands, because they get more data per instruction.

Part 2. What does CF=0 mean really?

It means 'random data not available'. This is because the details of why it can't get a number are not available to the CPU core without it going off and reading more registers, which it isn't going to do because there is nothing it can do with the information.

If you sucked the output buffer of the DRNG dry, you would get an underflow (CF=0) but you could expect the next RdRand to succeed, because the DRNG is fast.

If the DRNG failed (e.g. a transistor popped in the entropy source and it no longer was random) then the online health tests would detect this and shut down the DRNG. Then all your RdRand invocations would yield CF=0.

However on Ivy Bridge, you will not be able to underflow the buffer. The DRNG is a little faster than the bus to which it is attached. The effect of pulling more data per unit time (with parallel threads) will be to increase the execution time of each individual RdRand as contention on the bus causes the instructions to have to wait in line at the DRNG's local bus. You can never pull so fast the the DRNG will underflow. You will asymptotically reach 800 MBytes/s.

This also is not described in the documentation because it may not continue to be true in future products. We can envisage products where the buses are faster and the cores faster and the DRNG would be able to be underflowed. These things are not known yet, so we can't make claims about them.

What will remain true is that the basic loop (try up to 10 times, then report a failure up the stack) given in the software implementors guide will continue to work in future products, because we've made the claim that it will and so we will engineer all future products to meet this.

So no, CF=0 cannot occur because "the buffers happen to be (transiently) empty when RDRAND is invoked" on Ivy Bridge, but it might occur on future silicon, so design your software to cope.

这篇关于RDRAND在常春藤桥上的疲惫特征是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆