计算平方根有多慢(多少个周期)? [英] How slow (how many cycles) is calculating a square root?

查看:65
本文介绍了计算平方根有多慢(多少个周期)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

计算平方根有多慢(多少个周期)?这是在一门分子动力学课程中出现的,在该课程中,效率很重要,并且采用不必要的平方根会对算法的运行时间产生显着影响.

How slow (how many cycles) is calculating a square root? This came up in a molecular dynamics course where efficiency is important and taking unnecessary square roots had a noticeable impact on the running time of the algorithms.

推荐答案

来自 Agner Fog 的说明表:

From Agner Fog's Instruction Tables:

在 Core2 65nm 上,FSQRT 需要 9 到 69 cc(具有几乎相等的互易吞吐量),具体取决于值和精度位.为了进行比较,FDIV 需要 9 到 38 cc(具有几乎相等的互易吞吐量),FMUL 需要 5(recipthroughput = 2)和 FADD 需要 3(recipthroughput = 1).SSE 性能大致相同,但看起来更快,因为它无法进行 80 位数学运算.SSE虽然有一个超快的近似倒数和近似倒数sqrt.

On Core2 65nm, FSQRT takes 9 to 69 cc's (with almost equal reciprocal throughput), depending on the value and precision bits. For comparison, FDIV takes 9 to 38 cc's (with almost equal reciprocal throughput), FMUL takes 5 (recipthroughput = 2) and FADD takes 3 (recipthroughput = 1). SSE performance is about equal, but looks faster because it can't do 80bit math. SSE has a super fast approximate reciprocal and approximate reciprocal sqrt though.

在 Core2 45nm 上,除法和平方根更快;FSQRT 需要 6 到 20 cc,FDIV 需要 6 到 21 cc,FADD 和 FMUL 没有变化.上证所表现再次大致相同.

On Core2 45nm, division and square root got faster; FSQRT takes 6 to 20 cc's, FDIV takes 6 to 21 cc's, FADD and FMUL haven't changed. Once again SSE performance is about the same.

您可以从他的网站获取包含此信息的文档.

You can get the documents with this information from his website.

这篇关于计算平方根有多慢(多少个周期)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆