XMVector3Dot性能 [英] XMVector3Dot performance

查看:118
本文介绍了XMVector3Dot性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在运行性能分析器(VS2017)时,我发现 XMVector3Dot 显示花费一些时间(这是我的代码中进行冲突检测的一部分)。我发现通过将 XMVECTOR 的用法替换为 XMFLOAT3 并手动计算点积(相同的理由适用于其他向量运算),则我算法的速度更快。我知道在向GPU提供向量等时当然需要 XMVECTOR s,这是GPU能够理解的,但是可以预期,在CPU上进行计算时,速度会更快用 XMFLOAT3 s而不是 XMVECTOR s手动计算点积?

解决方案

有效使用SIMD需要多种技术,主要是使计算向量化的时间尽可能长。如果必须在向量化和标量之间来回转换,则会失去SIMD的性能优势。



点积采用两个向量并返回标量值。为了更轻松地使计算保持矢量化, XMVector3Dot 返回在向量上标称的标量值。如果您只是提取成分之一并返回到标量计算,则您的算法可能无法很好地向量化,实际上最好将点积作为标量运算来进行。



DirectXMath 包含一个碰撞头,该碰撞头具有遵循SIMD最佳实践的各种测试。例如:

 内联XMVECTOR PointOnPlaneInsideTriangle(FXMVECTOR P,FXMVECTOR V0,FXMVECTOR V1,GXMVECTOR V2)
{
//计算三角形法线。
XMVECTOR N = XMVector3Cross(XMVectorSubtract(V2,V0),XMVectorSubtract(V1,V0));

//计算向量从每个边的底到
//每个边向量的点的叉积。
XMVECTOR C0 = XMVector3Cross(XMVectorSubtract(P,V0),XMVectorSubtract(V1,V0));
XMVECTOR C1 = XMVector3Cross(XMVectorSubtract(P,V1),XMVectorSubtract(V2,V1));
XMVECTOR C2 = XMVector3Cross(XMVectorSubtract(P,V2),XMVectorSubtract(V0,V2));

//如果叉积与法线指向同一方向,则
//点位于边缘内(如果位于边缘,则为零)。
XMVECTOR零= XMVectorZero();
XMVECTOR Inside0 = XMVectorGreaterOrEqual(XMVector3Dot(C0,N),Zero);
XMVECTOR Inside1 = XMVectorGreaterOrEqual(XMVector3Dot(C1,N),Zero);
XMVECTOR Inside2 = XMVectorGreaterOrEqual(XMVector3Dot(C2,N),Zero);

//如果点在所有边缘内,则在其中。
return XMVectorAndInt(XMVectorAndInt(Inside0,Inside1),Inside2);
}

它不是进行标量转换而是比较,而是使用矢量化比较。 / p>

DirectXMath冲突代码也避免了动态分支。现代CPU具有很多计算能力,因此在没有动态分支或访问内存的情况下执行更多工作通常会更快。例如,这是球面三角测试:

 在线布尔BoundingSphere :: Intersects(FXMVECTOR V0,FXMVECTOR V1,FXMVECTOR V2 )const 
{
//加载球体。
XMVECTOR vCenter = XMLoadFloat3(& Center);
XMVECTOR vRadius = XMVectorReplicatePtr(& Radius);

//计算三角形的平面(必须归一化)。
XMVECTOR N = XMVector3Normalize(XMVector3Cross(XMVectorSubtract(V1,V0),XMVectorSubtract(V2,V0))));

//断言三角形不会退化。
assert(!XMVector3Equal(N,XMVectorZero()));

//查找三角形上与球体最近的特征。
XMVECTOR Dist = XMVector3Dot(XMVectorSubtract(vCenter,V0),N);

//如果球体的中心比
//球体的半径更远离三角形的平面,则不能有交点。
XMVECTOR NoIntersection = XMVectorLess(Dist,XMVectorNegate(vRadius));
NoIntersection = XMVectorOrInt(NoIntersection,XMVectorGreater(Dist,vRadius));

//将球的中心投影到三角形的平面上。
XMVECTOR Point = XMVectorNegativeMultiplySubtract(N,Dist,vCenter);

//是否在所有边缘内?如果是这样,我们相交是因为到平面的距离
//小于半径。
XMVECTOR交集= DirectX :: Internal :: PointOnPlaneInsideTriangle(Point,V0,V1,V2);

//查找每个边上的最近点。
XMVECTOR RadiusSq = XMVectorMultiply(vRadius,vRadius);

//边沿0,1
Point = DirectX :: Internal :: PointOnLineSegmentNearestPoint(V0,V1,vCenter);

//如果到球体中心点的距离小于
//球体的半径,则它必须相交。
Intersection = XMVectorOrInt(交集,XMVectorLessOrEqual(XMVector3LengthSq(XMVectorSubtract(vCenter,Point)),RadiusSq));

//边缘1,2
Point = DirectX :: Internal :: PointOnLineSegmentNearestPoint(V1,V2,vCenter);

//如果到球体中心到点的距离小于
//球体的半径,则它必须相交。
Intersection = XMVectorOrInt(交集,XMVectorLessOrEqual(XMVector3LengthSq(XMVectorSubtract(vCenter,Point)),RadiusSq));

//边缘2,0
Point = DirectX :: Internal :: PointOnLineSegmentNearestPoint(V2,V0,vCenter);

//如果到球体中心到点的距离小于
//球体的半径,则它必须相交。
Intersection = XMVectorOrInt(交集,XMVectorLessOrEqual(XMVector3LengthSq(XMVectorSubtract(vCenter,Point)),RadiusSq));

return XMVector4EqualInt(XMVectorAndCInt(Intersection,NoIntersection),XMVectorTrueInt());
}

对于您的算法,您应该(a)将其完全矢量化或( b)坚持使用标量点积。


When running a performance profiler (VS2017), I find that XMVector3Dot shows up as taking some time (it's part of my code that does collision detection). I find that by replacing the usage of XMVECTOR with XMFLOAT3 and manually calculating a dot product (the same reasoning applies to other vector operations), that the speed of my algorithm is faster. I understand that XMVECTORs are of course needed when suppling the GPU with vectors etc, this is what the GPU understands, but is it expected that when calculating on the CPU that it's faster to manually calculate a dot product with XMFLOAT3s instead of XMVECTORs?

解决方案

Efficient use of SIMD requires a number of techniques, primarily keeping your computation vectorized for as long as you can. If you have to convert back and forth between vectorized and scalar, the performance benefits of SIMD are lost.

Dot-product takes two vectors and returns a scalar value. To make it easier to keep computations vectorized, XMVector3Dot returns the scalar value 'splatted' across the vector. If you are just extracting one of the components and going back to scalar computations, then your algorithm is likely not well vectorized and you would in fact be better off doing dot product as a scalar operation.

DirectXMath includes a collision header with various tests that follow the SIMD best practices. For example:

inline XMVECTOR PointOnPlaneInsideTriangle(FXMVECTOR P, FXMVECTOR V0, FXMVECTOR V1, GXMVECTOR V2)
{
    // Compute the triangle normal.
    XMVECTOR N = XMVector3Cross( XMVectorSubtract( V2, V0 ), XMVectorSubtract( V1, V0 ) );

    // Compute the cross products of the vector from the base of each edge to 
    // the point with each edge vector.
    XMVECTOR C0 = XMVector3Cross( XMVectorSubtract( P, V0 ), XMVectorSubtract( V1, V0 ) );
    XMVECTOR C1 = XMVector3Cross( XMVectorSubtract( P, V1 ), XMVectorSubtract( V2, V1 ) );
    XMVECTOR C2 = XMVector3Cross( XMVectorSubtract( P, V2 ), XMVectorSubtract( V0, V2 ) );

    // If the cross product points in the same direction as the normal the the
    // point is inside the edge (it is zero if is on the edge).
    XMVECTOR Zero = XMVectorZero();
    XMVECTOR Inside0 = XMVectorGreaterOrEqual( XMVector3Dot( C0, N ), Zero );
    XMVECTOR Inside1 = XMVectorGreaterOrEqual( XMVector3Dot( C1, N ), Zero );
    XMVECTOR Inside2 = XMVectorGreaterOrEqual( XMVector3Dot( C2, N ), Zero );

    // If the point inside all of the edges it is inside.
    return XMVectorAndInt( XMVectorAndInt( Inside0, Inside1 ), Inside2 );
}

Instead of doing a scalar conversion an then comparison, it uses vectorized comparisons.

The DirectXMath collision code also avoids dynamic branches. Modern CPUs have a lot of computational power so doing more work without dynamic branches or accessing memory is often faster. For example, here is the sphere-triangle test:

inline bool BoundingSphere::Intersects( FXMVECTOR V0, FXMVECTOR V1, FXMVECTOR V2 ) const
{
    // Load the sphere.    
    XMVECTOR vCenter = XMLoadFloat3( &Center );
    XMVECTOR vRadius = XMVectorReplicatePtr( &Radius );

    // Compute the plane of the triangle (has to be normalized).
    XMVECTOR N = XMVector3Normalize( XMVector3Cross( XMVectorSubtract( V1, V0 ), XMVectorSubtract( V2, V0 ) ) );

    // Assert that the triangle is not degenerate.
    assert( !XMVector3Equal( N, XMVectorZero() ) );

    // Find the nearest feature on the triangle to the sphere.
    XMVECTOR Dist = XMVector3Dot( XMVectorSubtract( vCenter, V0 ), N );

    // If the center of the sphere is farther from the plane of the triangle than
    // the radius of the sphere, then there cannot be an intersection.
    XMVECTOR NoIntersection = XMVectorLess( Dist, XMVectorNegate( vRadius ) );
    NoIntersection = XMVectorOrInt( NoIntersection, XMVectorGreater( Dist, vRadius ) );

    // Project the center of the sphere onto the plane of the triangle.
    XMVECTOR Point = XMVectorNegativeMultiplySubtract( N, Dist, vCenter );

    // Is it inside all the edges? If so we intersect because the distance 
    // to the plane is less than the radius.
    XMVECTOR Intersection = DirectX::Internal::PointOnPlaneInsideTriangle( Point, V0, V1, V2 );

    // Find the nearest point on each edge.
    XMVECTOR RadiusSq = XMVectorMultiply( vRadius, vRadius );

    // Edge 0,1
    Point = DirectX::Internal::PointOnLineSegmentNearestPoint( V0, V1, vCenter );

    // If the distance to the center of the sphere to the point is less than 
    // the radius of the sphere then it must intersect.
    Intersection = XMVectorOrInt( Intersection, XMVectorLessOrEqual( XMVector3LengthSq( XMVectorSubtract( vCenter, Point ) ), RadiusSq ) );

    // Edge 1,2
    Point = DirectX::Internal::PointOnLineSegmentNearestPoint( V1, V2, vCenter );

    // If the distance to the center of the sphere to the point is less than 
    // the radius of the sphere then it must intersect.
    Intersection = XMVectorOrInt( Intersection, XMVectorLessOrEqual( XMVector3LengthSq( XMVectorSubtract( vCenter, Point ) ), RadiusSq ) );

    // Edge 2,0
    Point = DirectX::Internal::PointOnLineSegmentNearestPoint( V2, V0, vCenter );

    // If the distance to the center of the sphere to the point is less than 
    // the radius of the sphere then it must intersect.
    Intersection = XMVectorOrInt( Intersection, XMVectorLessOrEqual( XMVector3LengthSq( XMVectorSubtract( vCenter, Point ) ), RadiusSq ) );

    return XMVector4EqualInt( XMVectorAndCInt( Intersection, NoIntersection ), XMVectorTrueInt() );
}

For your algorithm, you should either (a) make it fully vectorized or (b) stick with a scalar dot-product.

这篇关于XMVector3Dot性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆