Apple Accelerate Framework扩展并标准化向量 [英] Apple Accelerate Framework scale and normalize a vector

查看：177 发布时间：2020/5/6 11:09:45 ios macos math accelerate-framework

本文介绍了Apple Accelerate Framework扩展并标准化向量的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我可以在Accelerate.framework中使用哪些函数来按标量缩放矢量并归一化矢量?我在文档中找到了一个我认为可能适用于扩展的内容，但是我对其操作感到困惑.

What functions can I use in Accelerate.framework to scale a vector by a scalar, and normalize a vector? I found one I think might work for scaling in the documentation but I am confused about it's operation.

vDSP_vsma
Vector scalar multiply and vector add; single precision.

void vDSP_vsma (
   const float *__vDSP_A,
   vDSP_Stride __vDSP_I,
   const float *__vDSP_B,
   const float *__vDSP_C,
   vDSP_Stride __vDSP_K,
   float *__vDSP_D,
   vDSP_Stride __vDSP_L,
   vDSP_Length __vDSP_N
);

推荐答案

就地归一化矢量的最简单方法是类似

The easiest way to normalize a vector in-place is something like

int n = 3;
float v[3] = {1, 2, 3};
cblas_sscal(n, 1.0 / cblas_snrm2(n, v, 1), v, 1);

您需要

#include <cblas.h>

或

#include <vblas.h>

(或两者).请注意，当对向量进行操作时，其中一些功能位于矩阵"部分.

(or both). Note that several of the functions are in the "matrix" section when they operate on vectors.

如果要使用vDSP功能，请参见

If you want to use the vDSP functions, see the Vector-Scalar Division section. There are several things you can do:

vDSP_dotpr()，sqrt()和vDSP_vsdiv()
vDSP_dotpr()，vrsqrte_f32()和vDSP_vsmul()(尽管vrsqrte_f32()是NEON GCC内置的，所以您需要检查是否正在为armv7进行编译).
vDSP_rmsqv()乘以sqrt(n)和vDSP_vsdiv()

vDSP_dotpr(), sqrt(), and vDSP_vsdiv()
vDSP_dotpr(), vrsqrte_f32(), and vDSP_vsmul() (vrsqrte_f32() is a NEON GCC built-in, though, so you need to check you're compiling for armv7).
vDSP_rmsqv(), multiply by sqrt(n), and vDSP_vsdiv()

之所以没有向量归一化功能，是因为vDSP中的向量"意味着很多东西"(直到4096/8192左右)，并且必然是线性代数.标准化1024元素向量是毫无意义的，而用于标准化3元素向量的快速函数并不能使您的应用程序更快，这就是为什么没有一个的原因.

The reason why there isn't a vector-normalization function is because the "vector" in vDSP means "lots of things at once" (up to around 4096/8192) and necessarily the "vector" from linear algebra. It's pretty meaningless to normalize a 1024-element vector, and a quick function for normalizing a 3-element vector isn't something that will make your app significantly faster, which is why there isn't one.

vDSP的预期用法更像是标准化1024 2-或3-元素向量.我可以发现执行此操作的几种方法:

The intended usage of vDSP is more like normalizing 1024 2- or 3-element vectors. I can spot a handful of ways to do this:

使用vDSP_vdist()获取长度向量，后跟vDSP_vdiv().不过，对于长度大于2的向量，必须多次使用vDSP_vdist().
使用vDSP_vsq()平方所有输入，多次使用vDSP_vadd()将所有输入相加，视情况等效于vDSP_vsqrt()或vDSP_vrsqrt()，以及vDSP_vmul()或vDSP_vdiv().编写与vDSP_vsqrt()或vDSP_vrsqrt()等效的文字应该并不难.
各种假装您的输入的方法是一个复杂的向量.不可能更快.

Use vDSP_vdist() to get a vector of lengths, followed by vDSP_vdiv(). You have to use vDSP_vdist() multiple times for vectors of length greater than 2, though.
Use vDSP_vsq() to square all the inputs, vDSP_vadd() multiple times to add all of them, the equivalent of vDSP_vsqrt() or vDSP_vrsqrt(), and vDSP_vmul() or vDSP_vdiv() as appropriate. It shouldn't be too hard to write the equivalent of vDSP_vsqrt() or vDSP_vrsqrt().
Various ways which pretend your input is a complex vector. Not likely to be faster.

当然，如果您没有1024个要归一化的向量，请不要使事情复杂化.

Of course, if you don't have 1024 vectors to normalize, don't overcomplicate things.

注意:

我不会使用"2-向量"和"3-向量"来避免相对论与四个向量"的混淆.
n 的一个不错的选择是几乎可以填满您的L1数据缓存.这并不困难；它们已经相对固定在32K了大约十年或更长时间(它们可能在超线程CPU中的虚拟内核之间共享，并且某些较旧/更便宜的处理器可能具有16K)，因此您最应该做的是在8192附近用于浮子的就地操作.您可能希望减去一些堆栈空间，并且如果要执行多个顺序操作，则可能希望将所有内容都保留在缓存中. 1024或2048似乎非常明智，并且任何其他内容都可能会导致收益递减.如果您愿意的话，请评估效果...

I don't use "2-vector" and "3-vector" to avoid confusion with the "four-vector" from relativity.
A good choice of n is one that nearly fills your L1 data cache. It's not difficult; they've been relatively fixed at 32K for around a decade or more (they may be shared between virtual cores in a hyperthreaded CPU and some older/cheaper processors might have 16K), so the most you should do is around 8192 for in-place operation on floats. You might want to subtract a little for stack space, and if you're doing several sequential operations you probably want to keep it all in cache; 1024 or 2048 seem pretty sensible and any more will probably hit diminishing returns. If you care, measure performance...

这篇关于Apple Accelerate Framework扩展并标准化向量的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Apple Accelerate Framework扩展并标准化向量 [英] Apple Accelerate Framework scale and normalize a vector

问题描述

推荐答案

相关文章

移动开发最新文章

热门教程

热门工具

登录关闭

Apple Accelerate Framework扩展并标准化向量 [英] Apple Accelerate Framework scale and normalize a vector

问题描述

推荐答案

相关文章

移动开发最新文章

热门教程

热门工具

登录 关闭

登录关闭