C ++“无原始循环";在不失去性能的情况下 [英] C++ "No raw loops" without losing perfomance

查看:80
本文介绍了C ++“无原始循环";在不失去性能的情况下的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,新的(旧的)大事情"是C ++中的无原始循环".我正在尝试以这种方式编写代码,但是效率似乎很低.是的,有一些STL算法可以处理任何事情,但它们似乎效率不高.

例如,我有一种情况,我想要一个指针,该指针指向得分最高的节点数组中的一个节点.确定分数是一项昂贵的浮点运算.因此,我实现了STL算法版本,并将其与原始循环进行了比较:

#include <cfloat>
#include <iostream>
#include <array>
#include <algorithm>
#include <numeric>

static int counter;

class Node {
public:
    auto Score() const -> double {
        std::cout << "complex calculation\n";
        counter++;
        return 1;
    }
};

int main()
{

    std::array<Node, 10> nodes;

    counter = 0;
    Node const* nodePtr = std::max_element(std::cbegin(nodes), std::cend(nodes),
        [](Node const& node1, Node const& node2) {
            return node1.Score() < node2.Score();
        });
    std::cout << "algorithm count " << counter << std::endl;

    counter = 0;
    double maxScore = -FLT_MAX;
    for (const auto& node : nodes) {
        auto score = node.Score();
        if (score > maxScore) {
            maxScore = score;
            nodePtr = &node;
        }
    }
    std::cout << "raw loop count " << counter << std::endl;
}

对此进行评估,对于STL版本,昂贵的Score函数将被评估18次,而raw循环仅使用10次评估...

我做错了吗,还是原始循环还不错呢?

user58697建议cout和静态计数器会阻止编译器优化之后,我更改了代码:

#include <cfloat>
#include <cmath>
#include <iostream>
#include <array>
#include <algorithm>
#include <numeric>
#include <random>
#include <chrono>

template <typename T>
class Random {
private:
    std::default_random_engine generator;
    std::uniform_real_distribution<T> distribution;
public:
    Random()
        : generator()
        , distribution(0.0, 1.0)
    {}

    auto operator()() {
        return distribution(generator);
    };
};

static Random<double> myRandom;

class Timer {
private:
    std::chrono::high_resolution_clock::time_point startTime{};
public:
    void Start() noexcept {
        startTime = std::chrono::high_resolution_clock::now();
    }
    [[nodiscard]] auto ElapsedMs() const noexcept {
        return std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::high_resolution_clock::now() - startTime).count();
    }
};

static Timer timer;

class Node {
private:
    double val;
public:
    Node() noexcept : val(myRandom()) {}

    [[nodiscard]] auto Score() const noexcept {
        auto score = std::sqrt(std::log(10.0 / val));
        score = std::sin(score) / std::cos(score);
        score = std::sqrt(std::sqrt(std::sqrt(std::sqrt(std::sqrt(score)))));
        score = std::pow(score, 1000);
        return score;
    }
};

int main()
{
    std::array<Node, 100000> nodes; // yeah, yeah... overloading the stack, I know

    for (auto i = 0; i < 2; i++) {
        timer.Start();
        Node const* nodePtr = &*std::max_element(std::cbegin(nodes), std::cend(nodes),
            [](Node const& node1, Node const& node2) {
                return node1.Score() < node2.Score();
            });
        std::cout << "algorithm elapsed time " << timer.ElapsedMs() << std::endl;

        timer.Start();
        double maxScore = -FLT_MAX;
        for (const auto& node : nodes) {
            auto score = node.Score();
            if (score > maxScore) {
                maxScore = score;
                nodePtr = &node;
            }
        }
        std::cout << "raw loop count " << timer.ElapsedMs() << std::endl;
    }
}

我运行了两次循环以消除启动行为...第二个循环的结果(与g ++ 9.1 -O3编译):

algorithm elapsed time 16
raw loop count 8 (<== I see I forgot to change "count" to "time" :P)

不是那样.

解决方案

用抽象算法替换原始循环是一种好方法,因为这样您可以多次重用该算法,但只能测试一次.用这种方式包装循环看起来像语法上的糖,但是它大大减少了代码中潜在的错误,因为您现在可以对抽象算法进行大量的单元测试,而您不必担心在需要时错误地实现它.

但是,您在这里比较的是苹果和橙子.您的max_element实现始终计算Score()进行比较,而您的for循环则缓存Score()函数的结果.

Node的更好实现可能是:

 class Node {
mutable:
    double cached_score = std::numeric_limits<double>::quiet_Nan();
public:
    auto Score() const -> double {
        if(std::isnan(cached_score)){
           std::cout << "complex calculation\n";
           counter++;
           cached_score = 1;
        }
        return cached_score;
    }
    void invalidate_cache() {
      cached_score = std::numeric_limits<double>::quiet_Nan();
    }
};
 

这样,复杂的计算仅执行一次.

或者,编写您自己的抽象:

 #include <cfloat>
#include <iostream>
#include <array>
#include <algorithm>
#include <numeric>

static int counter;

class Node {
public:
    auto Score() const -> double {
        std::cout << "complex calculation\n";
        counter++;
        return 1;
    }
};

template<class ForwardIt, class Evaluate, class Compare>
ForwardIt max_eval_element(
    ForwardIt first,
    ForwardIt last,
    Evaluate eval,
    Compare comp
){
    if (first == last) return last;

    ForwardIt largest = first;
    auto largest_val = eval(*first);
    ++first;
    for (; first != last; ++first) {
        const auto this_val = eval(*first);
        if (comp(largest_val, this_val)) {
            largest = first;
            largest_val = this_val;
        }
    }
    return largest;
}

int main()
{

    std::array<Node, 10> nodes;

    counter = 0;
    Node const* nodePtr = max_eval_element(std::cbegin(nodes), std::cend(nodes),
                                       [](Node const& node){ return node.Score(); },
                                       [](double const &a, double const &b) {
                                           return a<b;
                                       });
    std::cout << "algorithm count " << counter << std::endl;

    counter = 0;
    double maxScore = -FLT_MAX;
    for (const auto& node : nodes) {
        auto score = node.Score();
        if (score > maxScore) {
            maxScore = score;
            nodePtr = &node;
        }
    }
    std::cout << "raw loop count " << counter << std::endl;
}
 

在这种情况下,两个循环执行相同数量的求值.

我使用过的许多内部代码库都有扩展STL的扩展库.它使我工作的团队更加自信,他们的代码已正确编写,并且使您能够一目了然地解释复杂的操作.这样,这些抽象也减少了理解代码和通信的工作.

So the 'new (old) big thing' is "No Raw Loops" in C++. I'm trying to write code that way, but it seems very inefficient. Yes, there are STL algoritms which can do about anything, but they don't seem very efficient.

I for instance have a situation where I want a pointer to a node in an array of nodes that has the highest score. Determining that score is a costly floating-point operation. So I implemented the STL algorithm version and compared it with the raw loop:

#include <cfloat>
#include <iostream>
#include <array>
#include <algorithm>
#include <numeric>

static int counter;

class Node {
public:
    auto Score() const -> double {
        std::cout << "complex calculation\n";
        counter++;
        return 1;
    }
};

int main()
{

    std::array<Node, 10> nodes;

    counter = 0;
    Node const* nodePtr = std::max_element(std::cbegin(nodes), std::cend(nodes),
        [](Node const& node1, Node const& node2) {
            return node1.Score() < node2.Score();
        });
    std::cout << "algorithm count " << counter << std::endl;

    counter = 0;
    double maxScore = -FLT_MAX;
    for (const auto& node : nodes) {
        auto score = node.Score();
        if (score > maxScore) {
            maxScore = score;
            nodePtr = &node;
        }
    }
    std::cout << "raw loop count " << counter << std::endl;
}

Evaluating this, for the STL version, the costly Score function is evaluated 18 times, while the raw loop only uses 10 evaluations...

Am I doing it wrong, or are raw loops just not that bad?

edit: After the suggestion by user58697 that cout and the static counter would prevent compiler optimization, I changed the code:

#include <cfloat>
#include <cmath>
#include <iostream>
#include <array>
#include <algorithm>
#include <numeric>
#include <random>
#include <chrono>

template <typename T>
class Random {
private:
    std::default_random_engine generator;
    std::uniform_real_distribution<T> distribution;
public:
    Random()
        : generator()
        , distribution(0.0, 1.0)
    {}

    auto operator()() {
        return distribution(generator);
    };
};

static Random<double> myRandom;

class Timer {
private:
    std::chrono::high_resolution_clock::time_point startTime{};
public:
    void Start() noexcept {
        startTime = std::chrono::high_resolution_clock::now();
    }
    [[nodiscard]] auto ElapsedMs() const noexcept {
        return std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::high_resolution_clock::now() - startTime).count();
    }
};

static Timer timer;

class Node {
private:
    double val;
public:
    Node() noexcept : val(myRandom()) {}

    [[nodiscard]] auto Score() const noexcept {
        auto score = std::sqrt(std::log(10.0 / val));
        score = std::sin(score) / std::cos(score);
        score = std::sqrt(std::sqrt(std::sqrt(std::sqrt(std::sqrt(score)))));
        score = std::pow(score, 1000);
        return score;
    }
};

int main()
{
    std::array<Node, 100000> nodes; // yeah, yeah... overloading the stack, I know

    for (auto i = 0; i < 2; i++) {
        timer.Start();
        Node const* nodePtr = &*std::max_element(std::cbegin(nodes), std::cend(nodes),
            [](Node const& node1, Node const& node2) {
                return node1.Score() < node2.Score();
            });
        std::cout << "algorithm elapsed time " << timer.ElapsedMs() << std::endl;

        timer.Start();
        double maxScore = -FLT_MAX;
        for (const auto& node : nodes) {
            auto score = node.Score();
            if (score > maxScore) {
                maxScore = score;
                nodePtr = &node;
            }
        }
        std::cout << "raw loop count " << timer.ElapsedMs() << std::endl;
    }
}

I run the loop twice to eliminate startup behavior... results of second loop (compiled with g++ 9.1 -O3):

algorithm elapsed time 16
raw loop count 8 (<== I see I forgot to change "count" to "time" :P)

So that's not it.

解决方案

Replacing raw loops with abstracted algorithms is good style because then you can reuse the algorithm many times but test it only once. Wrapping the loop this way may seem like syntactic sugar, but it greatly reduces the potential for bugs in your code because you can now do extensive unit tests on the abstracted algorithm and you never need to worry about mistakenly implementing it incorrectly when you need it.

However, you're comparing apples and oranges here. Your max_element implementation always calculates Score() for its comparison whereas your for loop caches the result of the Score() function.

A better implementation of Node might be:

class Node {
mutable:
    double cached_score = std::numeric_limits<double>::quiet_Nan();
public:
    auto Score() const -> double {
        if(std::isnan(cached_score)){
           std::cout << "complex calculation\n";
           counter++;
           cached_score = 1;
        }
        return cached_score;
    }
    void invalidate_cache() {
      cached_score = std::numeric_limits<double>::quiet_Nan();
    }
};

This way the complex calculation is only performed once.

Alternatively, write your own abstraction:

#include <cfloat>
#include <iostream>
#include <array>
#include <algorithm>
#include <numeric>

static int counter;

class Node {
public:
    auto Score() const -> double {
        std::cout << "complex calculation\n";
        counter++;
        return 1;
    }
};

template<class ForwardIt, class Evaluate, class Compare>
ForwardIt max_eval_element(
    ForwardIt first,
    ForwardIt last,
    Evaluate eval,
    Compare comp
){
    if (first == last) return last;

    ForwardIt largest = first;
    auto largest_val = eval(*first);
    ++first;
    for (; first != last; ++first) {
        const auto this_val = eval(*first);
        if (comp(largest_val, this_val)) {
            largest = first;
            largest_val = this_val;
        }
    }
    return largest;
}

int main()
{

    std::array<Node, 10> nodes;

    counter = 0;
    Node const* nodePtr = max_eval_element(std::cbegin(nodes), std::cend(nodes),
                                       [](Node const& node){ return node.Score(); },
                                       [](double const &a, double const &b) {
                                           return a<b;
                                       });
    std::cout << "algorithm count " << counter << std::endl;

    counter = 0;
    double maxScore = -FLT_MAX;
    for (const auto& node : nodes) {
        auto score = node.Score();
        if (score > maxScore) {
            maxScore = score;
            nodePtr = &node;
        }
    }
    std::cout << "raw loop count " << counter << std::endl;
}

In this instance, both loops perform the same number of evaluations.

Many internal codebases I've worked with have extensive libraries which extend the STL. It gives the teams I've worked on much greater confidence that their code has been written correctly and allows you to interpret complex operations at a glance. In this way, these abstractions also reduce the effort of understanding code and the effort of communication.

这篇关于C ++“无原始循环";在不失去性能的情况下的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆