使用 sse 指令复杂的 Mul 和 Div [英] Complex Mul and Div using sse Instructions

查看：49 发布时间：2021/8/27 19:45:46 x86 sse simd complex-numbers

本文介绍了使用 sse 指令复杂的 Mul 和 Div的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

通过 SSE 指令执行复杂的乘法和除法是否有益?我知道使用 SSE 时加法和减法效果更好.有人能告诉我如何使用 SSE 执行复杂的乘法以获得更好的性能吗?

Is performing complex multiplication and division beneficial through SSE instructions? I know that addition and subtraction perform better when using SSE. Can someone tell me how I can use SSE to perform complex multiplication to get better performance?

推荐答案

出于完整性考虑，可以下载英特尔® 64 位和 IA-32 架构优化参考手册此处包含用于复数乘法的程序集(例 6-9)和复除法(例 6-10).

Just for completeness, the Intel® 64 and IA-32 Architectures Optimization Reference Manual that can be downloaded here contains assembly for complex multiply (Example 6-9) and complex divide (Example 6-10).

例如乘法代码:

// Multiplication of (ak + i bk ) * (ck + i dk )
// a + i b can be stored as a data structure
movsldup xmm0, src1; load real parts into the destination, a1, a1, a0, a0
movaps xmm1, src2; load the 2nd pair of complex values, i.e. d1, c1, d0, c0
mulps xmm0, xmm1; temporary results, a1d1, a1c1, a0d0, a0c0
shufps xmm1, xmm1, b1; reorder the real and imaginary parts, c1, d1, c0, d0
movshdup xmm2, src1; load imaginary parts into the destination, b1, b1, b0, b0
mulps xmm2, xmm1; temporary results, b1c1, b1d1, b0c0, b0d0
addsubps xmm0, xmm2; b1c1+a1d1, a1c1 -b1d1, b0c0+a0d0, ; a0c0-b0d0

程序集直接映射到 gccs X86 内在函数(只需使用 __builtin_ia32_ 来断言每条指令).

The assembly maps directly to gccs X86 intrinsics (just predicate each instruction with __builtin_ia32_).

这篇关于使用 sse 指令复杂的 Mul 和 Div的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用 sse 指令复杂的 Mul 和 Div [英] Complex Mul and Div using sse Instructions

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用 sse 指令复杂的 Mul 和 Div [英] Complex Mul and Div using sse Instructions

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭