一定要快点! [英] must go faster!

查看:77
本文介绍了一定要快点!的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道我是否可以使用sse来帮助更快地执行算术运算
。我有900个值,每个值必须用除数

进行缩放并相乘。这反复发生。任何我可以指出的例子

都会非常感激。我意识到我只能做一次乘法
(而不是乘法和除法),但我仍然想要一次做900(或者我可以用多少
)。


任何想法都会被欣赏。


比尔

I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a divide
and multiply. This happens repeatedly. Any examples I can be pointed to
would be greatly appreciatted. I realize I could do just one multiply
(instead of multiply and divide) but I still want to do 900 (or as many as I
can) at once.

Any ideas would be appreciatted.

Bill

推荐答案




如果您发布算法,人们可能会帮助优化它。我们做了很多密集的数学运算,并使用矢量库(来自Apple和I​​ntel)来支付
来处理低级别的东西,比如乘法和除法。我们使用了英特尔集成性能基元库,并且在执行手动编码循环时,性能是令人难以置信的(特别是在英特尔库上)。那个

说,有可能只是简单地通过优化算法来榨取一些性能(如你所说,结合乘法和

分支)。


史蒂夫

" bill" < WJ **** @ hotmail.com>在消息中写道

新闻:11 ************* @ corp.supernews.com ...
Hi,

If you post the algorithm, people may be able to help optimise it. We do a
lot of intensive maths and use vector libraries (from Apple and Intel) to
take care of the low-level stuff, like multiplication and division. We use
the Intel Integrated Performance Primitives library and the performance is
incredible (especially on Intel libraries) over doing hand-coded loops. That
said, it may be possible to squeeze some performance out simply by
optimising the algorithm (as you say, combining the multiplication and
division).

Steve
"bill" <wj****@hotmail.com> wrote in message
news:11*************@corp.supernews.com...
我想弄清楚如果我可以使用sse来帮助更快地执行算术运算。我有900个值,每个值必须用除数进行缩放
并相乘。这反复发生。我可以指出的任何例子都会非常感激。我意识到我只能做一次乘法(而不是乘法和除法)但我仍然想要做900(或者我可以的多次)。

任何想法会很高兴。

比尔
I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a divide
and multiply. This happens repeatedly. Any examples I can be pointed to
would be greatly appreciatted. I realize I could do just one multiply
(instead of multiply and divide) but I still want to do 900 (or as many as
I can) at once.

Any ideas would be appreciatted.

Bill



好的,我们走了。 psuedocode和真实代码的一点混合。


对于900个块

读取有符号整数值 - 有4个

比例值;缩放值=读取值/ 32768 * 360

存储值为双倍

下一个块


读取值为16位有符号整数

存储的缩放值是double类型

我意识到我可以这样做:(双)圆((双)读取值/ 91.02222)

>
但是如果我能做一个矢量,我可以快点。也许一次做900。我是

只是没有单指令多数据的东西。


只是一个例子,,,拜托。

谢谢,

比尔





" Steve McLellan" < sjm AT fixerlabs DOT com>在消息中写道

news:e0 ************** @ TK2MSFTNGP09.phx.gbl ...
Ok here we go. A little mix of psuedocode and real code.

For 900 blocks
read signed integer values - there are 4
scale values ; scaled value = read value / 32768 * 360
store value as double
next block

read value is 16 bit signed int
stored scaled value is of type double
I realize I could just do : (double)round((double)read value/91.02222)

But if I could do a vector, I could go fast. maybe do 900 at a time. I''m
just not up on single instruction multiple data stuff.

Just an example,,, please.
Thanks,
Bill





"Steve McLellan" <sjm AT fixerlabs DOT com> wrote in message
news:e0**************@TK2MSFTNGP09.phx.gbl...

如果您发布算法,人们可能会帮助优化它。我们进行了大量的强化数学运算,并使用矢量库(来自Apple和I​​ntel)来处理低级别的东西,比如乘法和除法。我们使用英特尔集成性能基元库,并且在执行手动编码循环方面的性能令人难以置信(特别是在英特尔库上)。
也就是说,有可能将某些性能挤出简单地通过优化算法(正如你所说,结合乘法和
除法)。

史蒂夫

" bill" < WJ **** @ hotmail.com>在消息中写道
新闻:11 ************* @ corp.supernews.com ...
Hi,

If you post the algorithm, people may be able to help optimise it. We do a
lot of intensive maths and use vector libraries (from Apple and Intel) to
take care of the low-level stuff, like multiplication and division. We use
the Intel Integrated Performance Primitives library and the performance is
incredible (especially on Intel libraries) over doing hand-coded loops.
That said, it may be possible to squeeze some performance out simply by
optimising the algorithm (as you say, combining the multiplication and
division).

Steve
"bill" <wj****@hotmail.com> wrote in message
news:11*************@corp.supernews.com...
我想知道我是否可以使用sse有助于更快地执行算术运算。我有900个值,每个值必须用
除法和乘法进行缩放。这反复发生。我可以指出的任何例子都会非常感激。我意识到我可以只做一次(而不是乘法和除法)但我仍然想要一次做900(或尽可能多)。

任何这些想法很有意思。

比尔
I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a
divide and multiply. This happens repeatedly. Any examples I can be
pointed to would be greatly appreciatted. I realize I could do just one
multiply (instead of multiply and divide) but I still want to do 900 (or
as many as I can) at once.

Any ideas would be appreciatted.

Bill



" Steve McLellan" < sjm AT fixerlabs DOT com>在留言中写道

新闻:e0 ************** @ TK2MSFTNGP09.phx.gbl ...

如果发布算法,人们可能能够帮助优化它。我们进行了大量的强化数学运算,并使用矢量库(来自Apple和I​​ntel)来处理低级别的东西,比如乘法和除法。我们使用英特尔集成性能基元库,并且在执行手动编码循环方面的性能令人难以置信(特别是在英特尔库上)。
也就是说,有可能将某些性能挤出简单地通过优化算法(正如你所说,结合乘法和
除法)。

史蒂夫

" bill" < WJ **** @ hotmail.com>在消息中写道
新闻:11 ************* @ corp.supernews.com ...


"Steve McLellan" <sjm AT fixerlabs DOT com> wrote in message
news:e0**************@TK2MSFTNGP09.phx.gbl... Hi,

If you post the algorithm, people may be able to help optimise it. We do a
lot of intensive maths and use vector libraries (from Apple and Intel) to
take care of the low-level stuff, like multiplication and division. We use
the Intel Integrated Performance Primitives library and the performance is
incredible (especially on Intel libraries) over doing hand-coded loops.
That said, it may be possible to squeeze some performance out simply by
optimising the algorithm (as you say, combining the multiplication and
division).

Steve
"bill" <wj****@hotmail.com> wrote in message
news:11*************@corp.supernews.com...

我想知道我是否可以使用sse有助于更快地执行算术运算。我有900个值,每个值必须用
除法和乘法进行缩放。这反复发生。我可以指出的任何例子都会非常感激。我意识到我可以只做一次(而不是乘法和除法)但我仍然想要一次做900(或尽可能多)。

任何想法会很有意思。

比尔
I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a
divide and multiply. This happens repeatedly. Any examples I can be
pointed to would be greatly appreciatted. I realize I could do just one
multiply (instead of multiply and divide) but I still want to do 900 (or
as many as I can) at once.

Any ideas would be appreciatted.

Bill




乘法应该快于除法。因此,您可以乘以0.010986328125

/ Fredrik

" bill"而不是划分为
91.0222 ; < WJ **** @ hotmail.com>在消息中写道

新闻:11 ************* @ corp.supernews.com ...
Multiplication should be faster than division. Thus, instead of division by
91.0222, you can multiply by 0.010986328125

/Fredrik

"bill" <wj****@hotmail.com> wrote in message
news:11*************@corp.supernews.com...
好的,我们走了。 psuedocode和真实代码的混合。

对于900块
读取有符号整数值 - 有4个
比例值; scaled value =读取值/ 32768 * 360
存储值为double
下一个块

读取值为16位符号int
存储的缩放值为double类型

我意识到我可以这样做:(双)圆((双)读取值/ 91.02222)

但是如果我能做一个矢量,我可以快点。也许一次做900。我只是没有单指令多数据的东西。

只是一个例子,,,拜托。

谢谢,
Bill




Steve McLellan < sjm AT fixerlabs DOT com>在消息中写道
新闻:e0 ************** @ TK2MSFTNGP09.phx.gbl ...
Ok here we go. A little mix of psuedocode and real code.

For 900 blocks
read signed integer values - there are 4
scale values ; scaled value = read value / 32768 * 360
store value as double
next block

read value is 16 bit signed int
stored scaled value is of type double
I realize I could just do : (double)round((double)read value/91.02222)

But if I could do a vector, I could go fast. maybe do 900 at a time. I''m
just not up on single instruction multiple data stuff.

Just an example,,, please.
Thanks,
Bill





"Steve McLellan" <sjm AT fixerlabs DOT com> wrote in message
news:e0**************@TK2MSFTNGP09.phx.gbl...

如果您发布算法,人们可能可以帮助优化它。我们做了很多强化数学的b $ b和使用矢量库(来自Apple和I​​ntel)
来处理低级别的东西,比如乘法和除法。我们
使用英特尔集成性能基元库,性能
令人难以置信(特别是在英特尔库上),而不是手动编码循环。
也就是说,有可能挤出一些性能简单地通过优化算法(正如你所说,结合乘法和
除法)。

史蒂夫

" bill" < WJ **** @ hotmail.com>在消息中写道
新闻:11 ************* @ corp.supernews.com ...
Hi,

If you post the algorithm, people may be able to help optimise it. We do a lot of intensive maths and use vector libraries (from Apple and Intel) to take care of the low-level stuff, like multiplication and division. We use the Intel Integrated Performance Primitives library and the performance is incredible (especially on Intel libraries) over doing hand-coded loops.
That said, it may be possible to squeeze some performance out simply by
optimising the algorithm (as you say, combining the multiplication and
division).

Steve
"bill" <wj****@hotmail.com> wrote in message
news:11*************@corp.supernews.com...
我想知道我是否可以使用sse有助于更快地执行算术运算。我有900个值,每个值必须用
除法和乘法进行缩放。这反复发生。我可以指出的任何例子都会非常感激。我意识到我可以只做一次(而不是乘法和除法)但我仍然想要一次做900(或尽可能多)。

任何这些想法很有意思。

比尔
I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a
divide and multiply. This happens repeatedly. Any examples I can be
pointed to would be greatly appreciatted. I realize I could do just one
multiply (instead of multiply and divide) but I still want to do 900 (or
as many as I can) at once.

Any ideas would be appreciatted.

Bill



Steve McLellan < sjm AT fixerlabs DOT com>在消息中写道
新闻:e0 ************** @ TK2MSFTNGP09.phx.gbl ...


"Steve McLellan" <sjm AT fixerlabs DOT com> wrote in message
news:e0**************@TK2MSFTNGP09.phx.gbl...


如果您发布算法,人们可能可以帮助优化它。我们做了很多强化数学的b $ b和使用矢量库(来自Apple和I​​ntel)
来处理低级别的东西,比如乘法和除法。我们
使用英特尔集成性能基元库,性能
令人难以置信(特别是在英特尔库上),而不是手动编码循环。
也就是说,有可能挤出一些性能简单地通过优化算法(正如你所说,结合乘法和
除法)。

史蒂夫

" bill" < WJ **** @ hotmail.com>在消息中写道
新闻:11 ************* @ corp.supernews.com ...
Hi,

If you post the algorithm, people may be able to help optimise it. We do a lot of intensive maths and use vector libraries (from Apple and Intel) to take care of the low-level stuff, like multiplication and division. We use the Intel Integrated Performance Primitives library and the performance is incredible (especially on Intel libraries) over doing hand-coded loops.
That said, it may be possible to squeeze some performance out simply by
optimising the algorithm (as you say, combining the multiplication and
division).

Steve
"bill" <wj****@hotmail.com> wrote in message
news:11*************@corp.supernews.com...
我想知道我是否可以使用sse有助于更快地执行算术运算。我有900个值,每个值必须用
除法和乘法进行缩放。这反复发生。我可以指出的任何例子都会非常感激。我意识到我可以只做一次(而不是乘法和除法)但我仍然想要一次做900(或尽可能多)。

任何想法会很高兴。

比尔
I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a
divide and multiply. This happens repeatedly. Any examples I can be
pointed to would be greatly appreciatted. I realize I could do just one
multiply (instead of multiply and divide) but I still want to do 900 (or
as many as I can) at once.

Any ideas would be appreciatted.

Bill








这篇关于一定要快点!的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆