本征ConditionType数组:一种有效的广播方式,而不是循环 [英] Eigen ConditionType array: Efficient way to broadcast instead of looping

查看:116
本文介绍了本征ConditionType数组:一种有效的广播方式,而不是循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一段对性能至关重要的代码,我需要检查一个数组中低于阈值的值,然后有条件地设置其他两个数组的值。我的代码如下:

I have a performance critical piece of code, where I need to check one array for values below a threshold and then conditionally set the values of two other arrays. My code looks like this:

#include <Eigen/Dense>

int main(){
    Eigen::ArrayXXd
        a (1, 100),
        b (2, 100),
        c (3, 100);
    
    a.setRandom();
    b.setRandom();
    c.setRandom();
    
    constexpr double minVal { 1e-8 };
    
    /* the code segment in question */
    /* option 1 */
    for ( int i=0; i<2; ++i ){
        b.row(i)   = (a < minVal).select( 0, c.row(i+1) / a );
        c.row(i+1) = (a < minVal).select( 0, c.row(i+1) );
    }
    /* option 2, which is slower */
    b = (a < minVal).replicate(2,1).select( 0, c.bottomRows(2) / a.replicate(2,1) );
    c.bottomRows(2) = (a < minVal).replicate(2,1).select( 0, c.bottomRows(2) );

    return 0;
}

数组 a 检查是否达到阈值 minVal ,具有一行并且具有动态列数。其他两个数组 b c 分别具有两行和三行,并且列数与<$相同c $ c> a 。

The array a, whose values are checked for reaching the threshold minVal, has one row and a dynamic number of columns. The other two arrays b and c have two and three rows, respectively, and the same number of columns as a.

现在,我想在更多的特征中执行上述逻辑方式,在选项1中没有该循环,因为 eigen 通常会欺骗性能,在编写原始循环时,我永远都希望匹配。
但是,我唯一想到的方法是选项2,它明显比选项1慢。

Now I would like to do the above logic in a more eigen way, without that loop in option 1, because typically, eigen has tricks up its sleeve for performance, that I can never hope to match when writing raw loops. However, the only way I could think of was option 2, which is noticeably slower than option 1.

执行上述操作的正确和有效方法是什么?还是循环已经是我最好的选择?

What would be the right and efficient way to do the above? Or is the loop already my best option?

推荐答案

您可以尝试以下操作:


  • 用固定的行数和动态的列数定义数组类型,即,您可以将 Eigen :: ArrayXXd 替换为 Eigen :: Array< double, 1/2/3,Eigen :: Dynamic>

  • 使用固定大小的块操作(请参见 https://eigen.tuxfamily.org/dox/group__TutorialBlockOperations.html ),即,您可以替换 bottomRows(N) bottomRows< N>()的strong>,并类似地使用复制< 2,1>() replicate(2,1)。 li>
  • Define your array types with fixed number of rows and dynamic number of columns, i.e., you can replace Eigen::ArrayXXd with Eigen::Array<double, 1/2/3, Eigen::Dynamic>.
  • Use fixed-size version of block operations (see https://eigen.tuxfamily.org/dox/group__TutorialBlockOperations.html), i.e., you can replace bottomRows(N) with bottomRows<N>() and similarly replicate(2,1) with replicate<2,1>().

我已经更改了代码中的数组类型,并包括了我提到的可能的改进的第三种选择:

I have changed the array types in your code and included a third option with the possible improvements that I have mentioned:

#include <Eigen/Dense>

#include <iostream>
#include <chrono>

constexpr int numberOfTrials = 1000000;
constexpr double minVal{ 1e-8 };

typedef Eigen::Array<double, 1, Eigen::Dynamic> Array1Xd;
typedef Eigen::Array<double, 2, Eigen::Dynamic> Array2Xd;
typedef Eigen::Array<double, 3, Eigen::Dynamic> Array3Xd;

inline void option1(const Array1Xd& a, Array2Xd& b, Array3Xd& c)
{
    for (int i = 0; i < 2; ++i) {
        b.row(i) = (a < minVal).select(0, c.row(i + 1) / a);
        c.row(i + 1) = (a < minVal).select(0, c.row(i + 1));
    }
}

inline void option2(const Array1Xd& a, Array2Xd& b, Array3Xd& c)
{
    b = (a < minVal).replicate(2, 1).select(0, c.bottomRows(2) / a.replicate(2, 1));
    c.bottomRows(2) = (a < minVal).replicate(2, 1).select(0, c.bottomRows(2));
}

inline void option3(const Array1Xd& a, Array2Xd& b, Array3Xd& c)
{
    b = (a < minVal).replicate<2, 1>().select(0, c.bottomRows<2>() / a.replicate<2, 1>());
    c.bottomRows<2>() = (a < minVal).replicate<2, 1>().select(0, c.bottomRows<2>());
}

int main() {
    Array1Xd a(1, 100);
    Array2Xd b(2, 100);
    Array3Xd c(3, 100);

    a.setRandom();
    b.setRandom();
    c.setRandom();

    auto tpBegin1 = std::chrono::steady_clock::now();
    for (int i = 0; i < numberOfTrials; i++)
        option1(a, b, c);
    auto tpEnd1 = std::chrono::steady_clock::now();

    auto tpBegin2 = std::chrono::steady_clock::now();
    for (int i = 0; i < numberOfTrials; i++)
        option2(a, b, c);
    auto tpEnd2 = std::chrono::steady_clock::now();

    auto tpBegin3 = std::chrono::steady_clock::now();
    for (int i = 0; i < numberOfTrials; i++)
        option3(a, b, c);
    auto tpEnd3 = std::chrono::steady_clock::now();

    std::cout << "(Option 1) Average execution time: " << std::chrono::duration_cast<std::chrono::microseconds>(tpEnd1 - tpBegin1).count() / (long double)(numberOfTrials) << " us" << std::endl;
    std::cout << "(Option 2) Average execution time: " << std::chrono::duration_cast<std::chrono::microseconds>(tpEnd2 - tpBegin2).count() / (long double)(numberOfTrials) << " us" << std::endl;
    std::cout << "(Option 3) Average execution time: " << std::chrono::duration_cast<std::chrono::microseconds>(tpEnd3 - tpBegin3).count() / (long double)(numberOfTrials) << " us" << std::endl;

    return 0;
}

我获得的平均执行时间如下(i7-9700K,msvc2019,启用了优化, NDEBUG):

Average execution times that I have obtained are as follows (i7-9700K, msvc2019, optimizations enabled, NDEBUG):

(Option 1) Average execution time: 0.527717 us
(Option 2) Average execution time: 3.25618 us
(Option 3) Average execution time: 0.512029 us

并启用AVX2 + OpenMP:

And with AVX2+OpenMP enabled:

(Option 1) Average execution time: 0.374309 us
(Option 2) Average execution time: 3.31356 us
(Option 3) Average execution time: 0.260551 us

我不确定它是否最本征。方式,但希望对您有所帮助!

I'm not sure if it is the most "Eigen" way but I hope it helps!

这篇关于本征ConditionType数组:一种有效的广播方式,而不是循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆