无法重现:C ++ Vector性能优于C#List性能 [英] Unable to reproduce: C++ Vector performance advantages over C# List performance

查看:54
本文介绍了无法重现:C ++ Vector性能优于C#List性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在微软的BUILD会议上,赫伯·萨特(Herb Sutter)解释说C ++具有"Real Arrays",而C#/Java语言却不相同或不相同.

我被卖了.您可以在此处观看完整的谈话 http://channel9.msdn.com/Events/Build/2014/2-661

这是幻灯片的快速快照,他在此处进行了描述. http://i.stack.imgur.com/DQaiF.png

但是我想看看我能带来多大的改变.

因此,我编写了非常幼稚的程序进行测试,该程序从文件中创建一个大的字符串向量,该文件的行范围从5个字符到50个字符.

链接到文件:

  www(dot)dropbox.com/s/evxn9iq3fu8qwss/result.txt 

然后我按顺序访​​问它们.

我在C#和C ++中都进行了此练习.

注意:我进行了一些修改,按照建议删除了循环中的副本.感谢您帮助我理解Real数组.

在C#中,我同时使用了List和ArrayList,因为不推荐使用ArrayList,而推荐使用List.

这是我的配备Core i7处理器的戴尔笔记本电脑上的结果:

  count C#(List< string>)C#(ArrayList)C ++1000 24毫秒21毫秒7毫秒10000 214毫秒213毫秒64毫秒100000 2秒123毫秒2秒125毫秒678毫秒 


C#代码:

 使用系统;使用System.Collections.Generic;使用System.Diagnostics;使用System.Linq;使用System.Text;使用System.Threading.Tasks;使用System.Collections;命名空间CSConsole{班级计划{静态void Main(string [] args){整数计数布尔成功= int.TryParse(args [0],out count);var watch = new Stopwatch();System.IO.StreamReader isrc =新的System.IO.StreamReader("result.txt");ArrayList list = new ArrayList();而(!isrc.EndOfStream){list.Add(isrc.ReadLine());}双k = 0;watch.Start();为(int i = 0; i< count; i ++){ArrayList temp =新的ArrayList();for(int j = 0; j< list.Count; j ++){//temp.Add(list [j]);k ++;}}watch.Stop();TimeSpan ts = watch.Elapsed;//Console.WriteLine(ts.ToString());Console.WriteLine(小时数:{0}分钟数:{1}秒:{2}毫秒数:{3}",ts.Hours,ts.Minutes,ts.Seconds,ts.Milliseconds);Console.WriteLine(k);isrc.Close();}}} 


C ++代码

  #include"stdafx.h"#include< stdio.h>#include< tchar.h>#include< vector>#include< fstream>#include< chrono>#include< iostream>#include< string>使用命名空间std;std :: chrono :: high_resolution_clock :: time_point time_now(){返回std :: chrono :: high_resolution_clock :: now();}float time_elapsed(std :: chrono :: high_resolution_clock :: time_point const& start){return std :: chrono :: duration_cast< std :: chrono :: milliseconds>(time_now()-start).count();//返回std :: chrono :: duration_cast< std :: chrono :: duration< float>>(time_now()-start).count();}int _tmain(int argc,_TCHAR * argv []){int count = _wtoi(argv [1]);向量< string>vs;fstream fs("result.txt",fstream :: in);如果(!fs)返回-1;char *缓冲区=新的char [1024];同时(!fs.eof()){fs.getline(buffer,1024);vs.push_back(string(buffer,fs.gcount()));}双k = 0;自动const start = time_now();为(int i = 0; i< count; i ++){向量< string>vs2;vector< string> :: const_iterator iter;for(iter = vs.begin(); iter!= vs.end(); iter ++){//vs2.push_back(*iter);k ++;}}自动const经过时间= time_elapsed(start);cout<<过去<<恩德尔cout<<k;fs.close();返回0;} 

解决方案

示例程序发现的差异与列表或其结构无关.

这是因为在C#中,字符串是引用类型,而在C ++代码中,您将它们用作值类型.

例如:

  string a ="foo bar baz";字符串b = a; 

分配 b = a 只是复制指针.

这一直到列表.将字符串添加到C#列表只是在添加指向该列表的指针.在主循环中,创建N个列表,所有列表仅包含指向相同字符串的指针.

但是,因为您在C ++中使用按值使用字符串,所以每次都必须复制它们.

  vector< string>vs2;vector< string> :: const_iterator iter;for(iter = vs.begin(); iter!= vs.end(); iter ++){vs2.push_back(* iter);//此处已复制STRING} 

此代码实际上是每个字符串的副本.您最终得到所有字符串的副本,并且将使用更多的内存.出于明显的原因,速度较慢.

如果您按如下所示重写循环:

  vector< string *>vs2;for(自动& s:vs){vs2.push_back(&(s));} 

然后,您现在要创建指针列表而不是副本列表,并且与C#具有同等地位.

在我的系统上,C#程序在大约 138毫秒中以1000的N运行,而C ++在 44毫秒中运行,这显然是C ++的胜利./p>


评论:

每张sutter sutter图片显示的C ++向量的主要好处之一是,内存布局可以是连续的(即所有项目在内存中彼此紧贴).但是,您永远都不会在 std :: string 上看到此功能,因为字符串需要动态分配内存(您不能在数组中将字符串的负载紧挨着放置,因为每个字符串都有一个不同的长度)

如果您想快速遍历所有对象,这将带来很大的好处,因为它对CPU缓存要友好得多,但是权衡是您必须复制所有项目才能将它们放入列表.

这是一个更好地说明它的例子:

C#代码:

  class ValueHolder {公共int标签;公共情报等等;public int otherBlah;公共ValueHolder(int t,int b,int o){标签= t;等等= b;otherBlah = o;}};静态ValueHolder findHolderWithTag(List< ValueHolder>缓冲区,int标记){//查找带有标签i的持有人foreach(缓冲区中的var持有者){如果(holder.tag ==标签)归还持有人;}返回新的ValueHolder(0,0,0);}静态void Main(string [] args){const int MAX = 99999;整数= 1000;//_wtoi(argv [1]);List< ValueHolder>vs = new List< ValueHolder>();for(int i = MAX; i> = 0; i--){vs.Add(new ValueHolder(i,0,0));//构造带有标签i,等等的0的valueholder}var watch = new Stopwatch();watch.Start();为(int i = 0; i< count; i ++){找到的ValueHolder = findHolderWithTag(vs,i);如果(found.tag!= i)抛出新的ArgumentException("not found");}watch.Stop();TimeSpan ts = watch.Elapsed;Console.WriteLine(小时数:{0}分钟数:{1}秒:{2}毫秒数:{3}",ts.Hours,ts.Minutes,ts.Seconds,ts.Milliseconds);} 

C ++代码:

  class ValueHolder {上市:int标签;诠释int otherBlah;ValueHolder(int t,int b,int o):标记(t),blah(b),otherBlah(o){}};const ValueHolder&findHolderWithTag(vector< ValueHolder>& int标记){//查找带有标签i的持有人for(const auto& holder:buffer){如果(holder.tag ==标签)归还持有人;}静态ValueHolder empty {0,0,0};返回空}int _tmain(int argc,_TCHAR * argv []){const int MAX = 99999;整数= 1000;//_wtoi(argv [1]);向量< ValueHolder>vs;for(int i = MAX; i> = 0; i--){vs.emplace_back(i,0,0);//构造带有标签i,等等的0的valueholder}自动const start = time_now();为(int i = 0; i< count; i ++){const ValueHolder&找到= findHolderWithTag(vs,i);if(found.tag!= i)//需要使用返回值,否则编译器将优化掉抛出未找到";}自动const经过时间= time_elapsed(start);cout<<经过的<<恩德尔返回0;} 

我们已经从最初的问题中知道,在C#中创建一堆重复列表比在C ++中创建列表要快得多,但是搜索列表呢?

这两个程序都只是在做一个愚蠢的线性列表扫描,以简单地证明这一点.

在我的PC上,C ++版本的运行时间为 72ms ,而C#版本的运行时间为 620ms .为什么速度增加?因为真实数组".

所有C ++ ValueHolders std :: vector 中彼此紧挨着.当循环要读取下一个循环时,这意味着它很可能已经在CPU缓冲区中.

C#ValueHolders位于各种随机存储器位置,并且列表仅包含指向它们的指针.当循环要读取下一个循环时,几乎可以肯定在CPU缓存中是 not ,因此我们必须去读取它.内存访问速度很慢,因此C#版本所需的时间几乎是原来的10倍.

如果将C#ValueHolder从 class 更改为 struct ,则C#List可以将它们全部紧挨着放置在内存中,并且时间降为161ms .现在,当您插入列表时,Buuut必须进行复制.

C#的问题在于,在许多情况下,您不能使用或不想使用结构,而在C ++中,您可以更好地控制这种情况.

PS:Java没有结构,因此您根本无法做到这一点.您对慢速缓存不友好版本10x感到困惑

At Microsoft's BUILD conference Herb Sutter explained that C++ has "Real Arrays" and C#/Java languages do not have the same or sort of.

I was sold on that. You can watch the full talk here http://channel9.msdn.com/Events/Build/2014/2-661

Here is a quick snapshot of the slide where he described this. http://i.stack.imgur.com/DQaiF.png

But I wanted to see how much difference will I make.

So I wrote very naive programs for testing, which create a large vector of strings from a file with lines ranging from 5 characters to 50 characters.

Link to the file:

www (dot) dropbox.com/s/evxn9iq3fu8qwss/result.txt

Then I accessed them in sequence.

I did this exercise in both C# and C++.

Note: I made some modifications, removed the copying in the loops as suggested. Thank you for helping me to understand the Real arrays.

In C# I used both List and ArrayList because ArrayList it is deprecated in favor of List.

Here are the results on my dell laptop with Core i7 processor:

count       C# (List<string>)   C# (ArrayList)     C++   

1000           24 ms              21 ms             7 ms       
10000         214 ms             213 ms            64 ms     
100000  2 sec 123 ms       2 sec 125 ms           678 ms    


C# code:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Collections;
namespace CSConsole
{
    class Program
    {
        static void Main(string[] args)
        {
            int count;
            bool success = int.TryParse(args[0], out count);

            var watch = new Stopwatch();
            System.IO.StreamReader isrc = new System.IO.StreamReader("result.txt");

            ArrayList list = new ArrayList();
            while (!isrc.EndOfStream)
            {
                list.Add(isrc.ReadLine());
            }
            double k = 0;
            watch.Start();
            for (int i = 0; i < count; i++)
            {
                ArrayList temp = new ArrayList();
                for (int j = 0; j < list.Count; j++)
                {
                   // temp.Add(list[j]);
                    k++;
                }
            }

            watch.Stop();
            TimeSpan ts = watch.Elapsed;

            //Console.WriteLine(ts.ToString());
            Console.WriteLine("Hours: {0} Miniutes: {1} Seconds: {2} Milliseconds: {3}", ts.Hours, ts.Minutes, ts.Seconds, ts.Milliseconds);
            Console.WriteLine(k);
            isrc.Close();
        }


    }
}


C++ code

#include "stdafx.h"
#include <stdio.h>
#include <tchar.h>

#include <vector>
#include <fstream>
#include <chrono>
#include <iostream>
#include <string>

using namespace std;

std::chrono::high_resolution_clock::time_point time_now()
{
    return std::chrono::high_resolution_clock::now();
}

float time_elapsed(std::chrono::high_resolution_clock::time_point const & start)
{

    return std::chrono::duration_cast<std::chrono::milliseconds>(time_now() - start).count();
    //return std::chrono::duration_cast<std::chrono::duration<float>>(time_now() - start).count();
}


int _tmain(int argc, _TCHAR* argv [])
{
    int  count = _wtoi(argv[1]);

    vector<string> vs;
    fstream fs("result.txt", fstream::in);
    if (!fs) return -1;

    char* buffer = new char[1024];
    while (!fs.eof())
    {
        fs.getline(buffer, 1024);
        vs.push_back(string(buffer, fs.gcount()));
    }
    double k = 0;
    auto const start = time_now();
    for (int i = 0; i < count; i++)
    {
        vector<string> vs2;
        vector<string>::const_iterator iter;
        for (iter = vs.begin(); iter != vs.end(); iter++)
        {
            //vs2.push_back(*iter);
            k++;
        }
    }

    auto const elapsed = time_elapsed(start);
    cout << elapsed << endl;
    cout << k;
    fs.close();
    return 0;
}

解决方案

The differences found by your sample program has nothing to do with lists or their structure.

It's because in C#, strings are a reference type, whereas in C++ code, you are using them as a value type.

For example:

string a = "foo bar baz";
string b = a;

Assigning b = a is just copying the pointer.

This follows through into lists. Adding a string to a C# list is just adding a pointer to the list. In your main loop, you create N lists, all of which just contain pointers to the same strings.

Because you're using strings by value in C++ however, it has to copy them each time.

vector<string> vs2;
vector<string>::const_iterator iter;
for (iter = vs.begin(); iter != vs.end(); iter++)
{
   vs2.push_back(*iter); // STRING IS COPIED HERE
}

This code is actually making copies of each string. You end up with copies of all the strings, and will use a lot more memory. This is slower for obvious reasons.

If you rewrite the loop as follows:

vector<string*> vs2;
for (auto& s : vs)
{
    vs2.push_back(&(s));
}

Then you're now creating lists-of-pointers not lists-of-copies and are on equal footing with C#.

On my system, the C# program runs with N of 1000 in about 138 milliseconds, and the C++ one runs in 44 milliseconds, a clear win to C++.


Commentary:

One of the main benefits of C++ vectors as per herb sutter's picture, is that the memory layout can be contiguous (i.e. all the items are stuck next to each other in memory). You'll never see this work with a std::string however, as strings require dynamic memory allocation (you can't stick a load of strings next to each other in an array because each string has a different length)

This would give a large benefit if you wanted to quickly iterate through them all, as it's much friendlier to CPU caches, but the tradeoff is that you have to copy all the items to get them into the list.

Here's an example which better illustrates it:

C# Code:

class ValueHolder {
    public int tag;
    public int blah;
    public int otherBlah;

    public ValueHolder(int t, int b, int o)
    { tag = t; blah = b; otherBlah = o; }
};

static ValueHolder findHolderWithTag(List<ValueHolder> buffer, int tag) {
    // find holder with tag i
    foreach (var holder in buffer) {
        if (holder.tag == tag)
            return holder;
    }
    return new ValueHolder(0, 0, 0);
}

static void Main(string[] args)
{
    const int MAX = 99999;

    int  count = 1000; // _wtoi(argv[1]);
    List<ValueHolder> vs = new List<ValueHolder>();
    for (int i = MAX; i >= 0; i--) {
        vs.Add(new ValueHolder(i, 0, 0)); // construct valueholder with tag i, blahs of 0
    }

    var watch = new Stopwatch();
    watch.Start();

    for (int i = 0; i < count; i++)
    {
        ValueHolder found = findHolderWithTag(vs, i);
        if (found.tag != i)
            throw new ArgumentException("not found");
    }

    watch.Stop();
    TimeSpan ts = watch.Elapsed;
    Console.WriteLine("Hours: {0} Miniutes: {1} Seconds: {2} Milliseconds: {3}", ts.Hours, ts.Minutes, ts.Seconds, ts.Milliseconds);
}

C++ Code:

class ValueHolder {
public:
    int tag;
    int blah;
    int otherBlah;

    ValueHolder(int t, int b, int o) : tag(t), blah(b), otherBlah(o) { }
};

const ValueHolder& findHolderWithTag(vector<ValueHolder>& buffer, int tag) {
    // find holder with tag i
    for (const auto& holder : buffer) {
        if (holder.tag == tag)
            return holder;
    }
    static ValueHolder empty{ 0, 0, 0 };
    return empty;
}

int _tmain(int argc, _TCHAR* argv[])
{
    const int MAX = 99999;

    int  count = 1000; // _wtoi(argv[1]);
    vector<ValueHolder> vs;
    for (int i = MAX; i >= 0; i--) {
        vs.emplace_back(i, 0, 0); // construct valueholder with tag i, blahs of 0
    }

    auto const start = time_now();
    for (int i = 0; i < count; i++)
    {
        const ValueHolder& found = findHolderWithTag(vs, i);
        if (found.tag != i) // need to use the return value or compiler will optimize away
            throw "not found";
    }

    auto const elapsed = time_elapsed(start);
    cout << elapsed << endl;
    return 0;
}

We already know from the original question that creating a bunch of duplicate lists would be much faster in C# than in C++, but what about searching the list instead?

Both programs are just doing a stupid linear list-scan in a simple attempt to show this.

On my PC, the C++ version runs in 72ms and the C# one takes 620ms. Why the speed increase? Because of the "real arrays".

All the C++ ValueHolders are stuck next to eachother in the std::vector. When the loop wants to read the next one, this means it's most likely already in the CPU cacue.

The C# ValueHolders are in all kinds of random memory locations, and the list just contains pointers to them. When the loop wants to read the next one, it is almost certainly not in the CPU cache, so we have to go and read it. Memory access is slow, hence the C# version takes nearly 10x as long.

If you change the C# ValueHolder from class to struct, then the C# List can stick them all next to eachother in memory, and the time drops to 161ms. Buuut now it has to make copies when you're inserting into the list.

The problem for C# is that there are many many situations where you can't or don't want to use a struct, whereas in C++ you have more control over this kind of thing.

PS: Java doesn't have structs, so you can't do this at all. You're stuck with the 10x as slow cache unfriendly version

这篇关于无法重现:C ++ Vector性能优于C#List性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆