铸造双浮动的开销? [英] Overhead of casting double to float?

查看:114
本文介绍了铸造双浮动的开销?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我有存储数据的兆字节双打是需要通过网络发送...现在我不需要precision的双重优惠,所以我想在发送前将这些转换为浮动他们在网络上。什么是简单地做的开销:

So I have megabytes of data stored as doubles that need to be sent over a network... now I don't need the precision that a double offers, so I want to convert these to a float before sending them over the network. What is the overhead of simply doing:

float myFloat = (float)myDouble;

我会做此操作几百万次每隔几秒钟,不想慢下来东西。谢谢

I'll be doing this operation several million times every few seconds and don't want to slow anything down. Thanks

编辑:我的平台是64位与Visual Studio 2008

My platform is x64 with Visual Studio 2008.

编辑2:我有超过他们的存储方式无法控制

EDIT 2: I have no control over how they are stored.

推荐答案

正如迈克尔·伯尔说,虽然开销很大程度上取决于你的平台上,开销肯定比通过网络发送它们所需的时间更短。

As Michael Burr said, while the overhead strongly depends on your platform, the overhead is definitely less than the time needed to send them over the wire.


的粗略估计:

在一个出色的千兆线速800MBit / s的有效载荷,25M-花车/秒。

800MBit/s payload on a excellent Gigabit wire, 25M-floats/second.

在2GHz的单核心,这给你一个惊人的 80 时钟为每个值的周期转换为收支平衡 - anythign少了,你会节省时间。这应该是绰绰有余的所有架构:)

On a 2GHz single core, that gives you a whopping 80 clock cycles for each value converted to break even - anythign less, and you will save time. That should be more than enough on all architectures :)

一个简单的加载存储周期(禁止所有缓存的延迟)应低于每值5个周期。随着指令交织,SIMD扩展和/或并行在多个内核上,你很可能会做多次转换的一个周期。

A simple load-store cycle (barring all caching delays) should be below 5 cycles per value. With instruction interleaving, SIMD extensions and/or parallelizing on multiple cores, you are likely to do multiple conversions in a single cycle.

此外,接收机将具有处理仅一半的数据高兴。请记住,内存访问时间是非线性的。

Also, the receiver will be happy having to handle only half the data. Remember that memory access time is nonlinear.


针对转换争论将是唯一的事情是,如果转让应该有最小的CPU负载:一个现代建筑可以从磁盘/存储的数据传送到总线无需CPU干预。然而,上面的数字我会说,没有在实践中关系。

The only thing arguing against the conversion would be is if the transfer should have minimal CPU load: a modern architecture could transfer the data from disk/memory to bus without CPU intervention. However, with above numbers I'd say that doesn't matter in practice.



我查了一些数字,387协处理器的确会采取大约70个周期负载存储周期。在最初的奔腾,你是下降到3个周期不使用任何并行。

[edit]
I checked some numbers, the 387 coprocessor would indeed have taken around 70 cycles for a load-store cycle. On the initial pentium, you are down to 3 cycles without any parallelization.

所以,除非你在386运行一个千兆网络...

So, unless you run a gigabit network on a 386...

这篇关于铸造双浮动的开销?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆