一定要快点! [英] must go faster!
问题描述
我想知道我是否可以使用sse来帮助更快地执行算术运算
。我有900个值,每个值必须用除数
进行缩放并相乘。这反复发生。任何我可以指出的例子
都会非常感激。我意识到我只能做一次乘法
(而不是乘法和除法),但我仍然想要一次做900(或者我可以用多少
)。
任何想法都会被欣赏。
比尔
I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a divide
and multiply. This happens repeatedly. Any examples I can be pointed to
would be greatly appreciatted. I realize I could do just one multiply
(instead of multiply and divide) but I still want to do 900 (or as many as I
can) at once.
Any ideas would be appreciatted.
Bill
推荐答案
如果您发布算法,人们可能会帮助优化它。我们做了很多密集的数学运算,并使用矢量库(来自Apple和Intel)来支付
来处理低级别的东西,比如乘法和除法。我们使用了英特尔集成性能基元库,并且在执行手动编码循环时,性能是令人难以置信的(特别是在英特尔库上)。那个
说,有可能只是简单地通过优化算法来榨取一些性能(如你所说,结合乘法和
分支)。
史蒂夫
" bill" < WJ **** @ hotmail.com>在消息中写道
新闻:11 ************* @ corp.supernews.com ...
Hi,
If you post the algorithm, people may be able to help optimise it. We do a
lot of intensive maths and use vector libraries (from Apple and Intel) to
take care of the low-level stuff, like multiplication and division. We use
the Intel Integrated Performance Primitives library and the performance is
incredible (especially on Intel libraries) over doing hand-coded loops. That
said, it may be possible to squeeze some performance out simply by
optimising the algorithm (as you say, combining the multiplication and
division).
Steve
"bill" <wj****@hotmail.com> wrote in message
news:11*************@corp.supernews.com...
我想弄清楚如果我可以使用sse来帮助更快地执行算术运算。我有900个值,每个值必须用除数进行缩放
并相乘。这反复发生。我可以指出的任何例子都会非常感激。我意识到我只能做一次乘法(而不是乘法和除法)但我仍然想要做900(或者我可以的多次)。
任何想法会很高兴。
比尔
I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a divide
and multiply. This happens repeatedly. Any examples I can be pointed to
would be greatly appreciatted. I realize I could do just one multiply
(instead of multiply and divide) but I still want to do 900 (or as many as
I can) at once.
Any ideas would be appreciatted.
Bill
好的,我们走了。 psuedocode和真实代码的一点混合。
对于900个块
读取有符号整数值 - 有4个
比例值;缩放值=读取值/ 32768 * 360
存储值为双倍
下一个块
读取值为16位有符号整数
存储的缩放值是double类型
我意识到我可以这样做:(双)圆((双)读取值/ 91.02222)
>
但是如果我能做一个矢量,我可以快点。也许一次做900。我是
只是没有单指令多数据的东西。
只是一个例子,,,拜托。
谢谢,
比尔
" Steve McLellan" < sjm AT fixerlabs DOT com>在消息中写道
news:e0 ************** @ TK2MSFTNGP09.phx.gbl ...
Ok here we go. A little mix of psuedocode and real code.
For 900 blocks
read signed integer values - there are 4
scale values ; scaled value = read value / 32768 * 360
store value as double
next block
read value is 16 bit signed int
stored scaled value is of type double
I realize I could just do : (double)round((double)read value/91.02222)
But if I could do a vector, I could go fast. maybe do 900 at a time. I''m
just not up on single instruction multiple data stuff.
Just an example,,, please.
Thanks,
Bill
"Steve McLellan" <sjm AT fixerlabs DOT com> wrote in message
news:e0**************@TK2MSFTNGP09.phx.gbl...
>
如果您发布算法,人们可能会帮助优化它。我们进行了大量的强化数学运算,并使用矢量库(来自Apple和Intel)来处理低级别的东西,比如乘法和除法。我们使用英特尔集成性能基元库,并且在执行手动编码循环方面的性能令人难以置信(特别是在英特尔库上)。
也就是说,有可能将某些性能挤出简单地通过优化算法(正如你所说,结合乘法和
除法)。
史蒂夫
" bill" < WJ **** @ hotmail.com>在消息中写道
新闻:11 ************* @ corp.supernews.com ...
Hi,
If you post the algorithm, people may be able to help optimise it. We do a
lot of intensive maths and use vector libraries (from Apple and Intel) to
take care of the low-level stuff, like multiplication and division. We use
the Intel Integrated Performance Primitives library and the performance is
incredible (especially on Intel libraries) over doing hand-coded loops.
That said, it may be possible to squeeze some performance out simply by
optimising the algorithm (as you say, combining the multiplication and
division).
Steve
"bill" <wj****@hotmail.com> wrote in message
news:11*************@corp.supernews.com...
我想知道我是否可以使用sse有助于更快地执行算术运算。我有900个值,每个值必须用
除法和乘法进行缩放。这反复发生。我可以指出的任何例子都会非常感激。我意识到我可以只做一次(而不是乘法和除法)但我仍然想要一次做900(或尽可能多)。
任何这些想法很有意思。
比尔
I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a
divide and multiply. This happens repeatedly. Any examples I can be
pointed to would be greatly appreciatted. I realize I could do just one
multiply (instead of multiply and divide) but I still want to do 900 (or
as many as I can) at once.
Any ideas would be appreciatted.
Bill
" Steve McLellan" < sjm AT fixerlabs DOT com>在留言中写道
新闻:e0 ************** @ TK2MSFTNGP09.phx.gbl ...
如果发布算法,人们可能能够帮助优化它。我们进行了大量的强化数学运算,并使用矢量库(来自Apple和Intel)来处理低级别的东西,比如乘法和除法。我们使用英特尔集成性能基元库,并且在执行手动编码循环方面的性能令人难以置信(特别是在英特尔库上)。
也就是说,有可能将某些性能挤出简单地通过优化算法(正如你所说,结合乘法和
除法)。
史蒂夫
" bill" < WJ **** @ hotmail.com>在消息中写道
新闻:11 ************* @ corp.supernews.com ...
"Steve McLellan" <sjm AT fixerlabs DOT com> wrote in message
news:e0**************@TK2MSFTNGP09.phx.gbl... Hi,
If you post the algorithm, people may be able to help optimise it. We do a
lot of intensive maths and use vector libraries (from Apple and Intel) to
take care of the low-level stuff, like multiplication and division. We use
the Intel Integrated Performance Primitives library and the performance is
incredible (especially on Intel libraries) over doing hand-coded loops.
That said, it may be possible to squeeze some performance out simply by
optimising the algorithm (as you say, combining the multiplication and
division).
Steve
"bill" <wj****@hotmail.com> wrote in message
news:11*************@corp.supernews.com...
我想知道我是否可以使用sse有助于更快地执行算术运算。我有900个值,每个值必须用
除法和乘法进行缩放。这反复发生。我可以指出的任何例子都会非常感激。我意识到我可以只做一次(而不是乘法和除法)但我仍然想要一次做900(或尽可能多)。
任何想法会很有意思。
比尔
I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a
divide and multiply. This happens repeatedly. Any examples I can be
pointed to would be greatly appreciatted. I realize I could do just one
multiply (instead of multiply and divide) but I still want to do 900 (or
as many as I can) at once.
Any ideas would be appreciatted.
Bill
乘法应该快于除法。因此,您可以乘以0.010986328125
/ Fredrik
" bill"而不是划分为
91.0222 ; < WJ **** @ hotmail.com>在消息中写道
新闻:11 ************* @ corp.supernews.com ...
Multiplication should be faster than division. Thus, instead of division by
91.0222, you can multiply by 0.010986328125
/Fredrik
"bill" <wj****@hotmail.com> wrote in message
news:11*************@corp.supernews.com...
好的,我们走了。 psuedocode和真实代码的混合。
对于900块
读取有符号整数值 - 有4个
比例值; scaled value =读取值/ 32768 * 360
存储值为double
下一个块
读取值为16位符号int
存储的缩放值为double类型
我意识到我可以这样做:(双)圆((双)读取值/ 91.02222)
但是如果我能做一个矢量,我可以快点。也许一次做900。我只是没有单指令多数据的东西。
只是一个例子,,,拜托。
谢谢,
Bill
Steve McLellan < sjm AT fixerlabs DOT com>在消息中写道
新闻:e0 ************** @ TK2MSFTNGP09.phx.gbl ...
Ok here we go. A little mix of psuedocode and real code.
For 900 blocks
read signed integer values - there are 4
scale values ; scaled value = read value / 32768 * 360
store value as double
next block
read value is 16 bit signed int
stored scaled value is of type double
I realize I could just do : (double)round((double)read value/91.02222)
But if I could do a vector, I could go fast. maybe do 900 at a time. I''m
just not up on single instruction multiple data stuff.
Just an example,,, please.
Thanks,
Bill
"Steve McLellan" <sjm AT fixerlabs DOT com> wrote in message
news:e0**************@TK2MSFTNGP09.phx.gbl...
>如果您发布算法,人们可能可以帮助优化它。我们做了很多强化数学的b $ b和使用矢量库(来自Apple和Intel)
来处理低级别的东西,比如乘法和除法。我们
使用英特尔集成性能基元库,性能
令人难以置信(特别是在英特尔库上),而不是手动编码循环。
也就是说,有可能挤出一些性能简单地通过优化算法(正如你所说,结合乘法和
除法)。
史蒂夫
" bill" < WJ **** @ hotmail.com>在消息中写道
新闻:11 ************* @ corp.supernews.com ...
Hi,
If you post the algorithm, people may be able to help optimise it. We do a lot of intensive maths and use vector libraries (from Apple and Intel) to take care of the low-level stuff, like multiplication and division. We use the Intel Integrated Performance Primitives library and the performance is incredible (especially on Intel libraries) over doing hand-coded loops.
That said, it may be possible to squeeze some performance out simply by
optimising the algorithm (as you say, combining the multiplication and
division).
Steve
"bill" <wj****@hotmail.com> wrote in message
news:11*************@corp.supernews.com...
我想知道我是否可以使用sse有助于更快地执行算术运算。我有900个值,每个值必须用
除法和乘法进行缩放。这反复发生。我可以指出的任何例子都会非常感激。我意识到我可以只做一次(而不是乘法和除法)但我仍然想要一次做900(或尽可能多)。
任何这些想法很有意思。
比尔
I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a
divide and multiply. This happens repeatedly. Any examples I can be
pointed to would be greatly appreciatted. I realize I could do just one
multiply (instead of multiply and divide) but I still want to do 900 (or
as many as I can) at once.
Any ideas would be appreciatted.
Bill
Steve McLellan < sjm AT fixerlabs DOT com>在消息中写道
新闻:e0 ************** @ TK2MSFTNGP09.phx.gbl ...
"Steve McLellan" <sjm AT fixerlabs DOT com> wrote in message
news:e0**************@TK2MSFTNGP09.phx.gbl...
>如果您发布算法,人们可能可以帮助优化它。我们做了很多强化数学的b $ b和使用矢量库(来自Apple和Intel)
来处理低级别的东西,比如乘法和除法。我们
使用英特尔集成性能基元库,性能
令人难以置信(特别是在英特尔库上),而不是手动编码循环。
也就是说,有可能挤出一些性能简单地通过优化算法(正如你所说,结合乘法和
除法)。
史蒂夫
" bill" < WJ **** @ hotmail.com>在消息中写道
新闻:11 ************* @ corp.supernews.com ...
Hi,
If you post the algorithm, people may be able to help optimise it. We do a lot of intensive maths and use vector libraries (from Apple and Intel) to take care of the low-level stuff, like multiplication and division. We use the Intel Integrated Performance Primitives library and the performance is incredible (especially on Intel libraries) over doing hand-coded loops.
That said, it may be possible to squeeze some performance out simply by
optimising the algorithm (as you say, combining the multiplication and
division).
Steve
"bill" <wj****@hotmail.com> wrote in message
news:11*************@corp.supernews.com...
我想知道我是否可以使用sse有助于更快地执行算术运算。我有900个值,每个值必须用
除法和乘法进行缩放。这反复发生。我可以指出的任何例子都会非常感激。我意识到我可以只做一次(而不是乘法和除法)但我仍然想要一次做900(或尽可能多)。
任何想法会很高兴。
比尔
I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a
divide and multiply. This happens repeatedly. Any examples I can be
pointed to would be greatly appreciatted. I realize I could do just one
multiply (instead of multiply and divide) but I still want to do 900 (or
as many as I can) at once.
Any ideas would be appreciatted.
Bill
这篇关于一定要快点!的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!