与类似的内存访问相比,C ++中的链表迭代比Go中的慢 [英] Iterating over linked list in C++ is slower than in Go with analogous memory access

查看:65
本文介绍了与类似的内存访问相比,C ++中的链表迭代比Go中的慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在各种情况下,我观​​察到C ++中的链表迭代始终比Go中慢10-15%.解决这个堆栈溢出问题的第一步是这里.我编写的示例有问题,因为:

In a variety of contexts I've observed that linked list iteration is consistently slower in C++ than in Go by 10-15%. My first attempt at resolving this mystery on Stack Overflow is here. The example I coded up was problematic because:

1)由于堆分配,内存访问是不可预测的,并且

1) memory access was unpredictable because of heap allocations, and

2)因为没有实际的工作要做,所以某些人的编译器正在优化主循环.

2) because there was no actual work being done, some people's compilers were optimizing away the main loop.

要解决这些问题,我有一个新程序,其中包含C ++和Go的实现. C ++版本花费1.75秒,而Go版本花费1.48秒.这次,我在计时开始之前进行了一个大堆分配,并使用它来操作对象池,并从中释放并获取链表的节点.这样,两种实现之间的内存访问应该完全相似.

To resolve these issues I have a new program with implementations in C++ and Go. The C++ version takes 1.75 secs compared to 1.48 secs for the Go version. This time, I do one large heap allocation before timing begins and use it to operate an object pool from which I release and acquire nodes for the linked list. This way the memory access should be completely analogous between the two implementations.

希望这可以使奥秘重现!

Hopefully this makes the mystery more reproducible!

C ++:

#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
#include <vector>
#include <boost/timer.hpp>

using namespace std;

struct Node {
    Node *next; // 8 bytes
    int age;   // 4 bytes
};

// Object pool, where every free slot points to the previous free slot
template<typename T, int n>
struct ObjPool
{
    typedef T*       pointer;
    typedef pointer* metapointer;

    ObjPool() :
        _top(NULL),
        _size(0)
    {
        pointer chunks = new T[n];
        for (int i=0; i < n; i++) {
            release(&chunks[i]);
        }
    }

    // Giver an available pointer to the object pool
    void release(pointer ptr)
    {
        // Store the current pointer at the given address
        *(reinterpret_cast<metapointer>(ptr)) = _top;

        // Advance the pointer
        _top = ptr;

        // Increment the size
        ++_size;
    }

    // Pop an available pointer off the object pool for program use
    pointer acquire(void)
    {
        if(_size == 0){throw std::out_of_range("");}

        // Pop the top of the stack
        pointer retval = _top;

        // Step back to the previous address
        _top = *(reinterpret_cast<metapointer>(_top));

        // Decrement the size
        --_size;

        // Return the next free address
        return retval;
    }

    unsigned int size(void) const {return _size;}

protected:
    pointer _top;

    // Number of free slots available
    unsigned int _size;
};

Node *nodes = nullptr;
ObjPool<Node, 1000> p;

void processAge(int age) {
    // If the object pool is full, pop off the head of the linked list and release
    // it from the pool
    if (p.size() == 0) {
        Node *head = nodes;
        nodes = nodes->next;
        p.release(head);
    }

    // Insert the new Node with given age in global linked list. The linked list is sorted by age, so this requires iterating through the nodes.
    Node *node = nodes;
    Node *prev = nullptr;
    while (true) {
        if (node == nullptr || age < node->age) {
            Node *newNode = p.acquire();
            newNode->age = age;
            newNode->next = node;

            if (prev == nullptr) {
                nodes = newNode;
            } else {
                prev->next = newNode;
            }

            return;
        }

        prev = node;
        node = node->next;
    }
}

int main() {
    Node x = {};
    std::cout << "Size of struct: " << sizeof(x) << "\n"; // 16 bytes

    boost::timer t;
    for (int i=0; i<1000000; i++) {
        processAge(i);
    }

    std::cout << t.elapsed() << "\n";
}

开始:

package main

import (
    "time"
    "fmt"
    "unsafe"
)

type Node struct {
    next *Node // 8 bytes
    age int32 // 4 bytes
}

// Every free slot points to the previous free slot
type NodePool struct {
    top *Node
    size int
}

func NewPool(n int) NodePool {
    p := NodePool{nil, 0}
    slots := make([]Node, n, n)
    for i := 0; i < n; i++ {
        p.Release(&slots[i])
    }

    return p
}

func (p *NodePool) Release(l *Node) {
    // Store the current top at the given address
    *((**Node)(unsafe.Pointer(l))) = p.top
    p.top = l
    p.size++
}

func (p *NodePool) Acquire() *Node {
    if p.size == 0 {
        fmt.Printf("Attempting to pop from empty pool!\n")
    }
    retval := p.top

    // Step back to the previous address in stack of addresses
    p.top = *((**Node)(unsafe.Pointer(p.top)))
    p.size--
    return retval
}

func processAge(age int32) {
    // If the object pool is full, pop off the head of the linked list and release
    // it from the pool
    if p.size == 0 {
        head := nodes
        nodes = nodes.next
        p.Release(head)
    }

    // Insert the new Node with given age in global linked list. The linked list is sorted by age, so this requires iterating through the nodes.
    node := nodes
    var prev *Node = nil
    for true {
        if node == nil || age < node.age {
            newNode := p.Acquire()
            newNode.age = age
            newNode.next = node

            if prev == nil {
                nodes = newNode
            } else {
                prev.next = newNode
            }
            return
        }

        prev = node
        node = node.next
    }
}

// Linked list of nodes, in ascending order by age
var nodes *Node = nil
var p NodePool = NewPool(1000)

func main() {
    x := Node{};
    fmt.Printf("Size of struct: %d\n", unsafe.Sizeof(x)) // 16 bytes

    start := time.Now()
    for i := 0; i < 1000000; i++ {
        processAge(int32(i))
    }

    fmt.Printf("Time elapsed: %s\n", time.Since(start))
}

输出:

clang++ -std=c++11 -stdlib=libc++ minimalPool.cpp -O3; ./a.out
Size of struct: 16
1.7548

go run minimalPool.go
Size of struct: 16
Time elapsed: 1.487930629s

推荐答案

两个程序之间的最大区别是,您的Go代码会忽略错误(如果幸运的话,如果清空池,则会恐慌或出现段错误). ,而您的C ++代码则通过异常传播错误.比较:

The big difference between your two programs is that your Go code ignores errors (and will panic or segfault, if you're lucky, if you empty the pool), while your C++ code propagates errors via exception. Compare:

if p.size == 0 {
    fmt.Printf("Attempting to pop from empty pool!\n")
}

vs.

if(_size == 0){throw std::out_of_range("");}


至少有三种方法可以使比较公平 1 :


There are at least three ways1 to make the comparison fair:

  1. 可以像在Go中一样更改C ++代码以忽略该错误,
  2. 错误时将两个版本都更改为panic/abort.
  3. 像在C ++中一样,更改Go版本以惯用地处理错误, 2 .
  1. Can change the C++ code to ignore the error, as you do in Go,
  2. Change both versions to panic/abort on error.
  3. Change the Go version to handle errors idiomatically,2 as you do in C++.

所以,让我们全部做一下,然后比较结果 3 :

So, let's do all of them and compare the results3:

  • C ++忽略错误:墙1.059329s,用户1.050000s + 0.000000s系统= 1.050000s CPU(99.1%)
  • C ++因错误中止:1.081585s墙,1.060000s用户+ 0.000000s系统= 1.060000s CPU(98.0%)
  • 因错误而慌张:经过的时间:1.152994227s
  • 忽略错误:耗时:1.196426068s
  • 处理惯用错误:经过的时间:1.322005119s
  • C ++异常:1.373458s墙,1.360000s用户+ 0.000000s系统= 1.360000s CPU(99.0%)

所以:

  • 没有错误处理,C ++比Go更快.
  • 通过恐慌,Go变得更快, 4 ,但仍不及C ++快.
  • 通过惯用的错误处理,C ++的速度比Go慢得多.
  • Without error handling, C++ is faster than Go.
  • With panicking, Go gets faster,4 but still not as fast as C++.
  • With idiomatic error handling, C++ slows down a lot more than Go.

为什么?该异常实际上不会在您的测试运行中发生,因此实际的错误处理代码绝不会以任何一种语言运行.但是clang不能证明它不会发生.而且,由于您永远不会在任何地方catch异常,这意味着它必须为整个堆栈中的每个非遗漏帧发出异常处理程序和堆栈展开器.因此,它在每个函数调用和返回上做更多的工作,而不是做更多的工作,但是随后您的函数做的实际工作却很少,以至于不必要的额外工作加起来了.

Why? This exception never actually happens in your test run, so the actual error-handling code never runs in either language. But clang can't prove that it doesn't happen. And, since you never catch the exception anywhere, that means it has to emit exception handlers and stack unwinders for every non-elided frame all the way up the stack. So it's doing more work on each function call and return—not much more work, but then your function is doing so little real work that the unnecessary extra work adds up.

1.您还可以更改C ++版本以执行C样式的错误处理,或使用Option类型,以及可能的其他可能性.

2.当然,这需要进行很多更改:您需要导入errors,将Acquire的返回类型更改为(*Node, error),将processAge的返回类型更改为error,更改所有的语句,并添加至少两个if err != nil { … }检查.但这对Go来说应该是一件好事,对吧?

2. This, of course, requires a lot more changes: you need to import errors, change the return type of Acquire to (*Node, error), change the return type of processAge to error, change all your return statements, and add at least two if err != nil { … } checks. But that's supposed to be a good thing about Go, right?

3.在进行此操作时,我将您的旧版boost::timer替换为boost::auto_cpu_timer,所以现在我们看到的是挂钟时间(与Go一样)以及CPU时间.

3. While I was at it, I replaced your legacy boost::timer with boost::auto_cpu_timer, so we're now seeing wall clock time (as with Go) as well as CPU time.

4.我不会尝试解释原因,因为我不理解.快速浏览一下装配,显然已经对某些检查进行了优化,但是我看不到为什么没有panic就无法优化这些检查.

4. I won't attempt to explain why, because I don't understand it. From a quick glance at the assembly, it's clearly optimized out some checks, but I can't see why it couldn't optimize out those same checks without the panic.

这篇关于与类似的内存访问相比,C ++中的链表迭代比Go中的慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆