每个编译器的C ++性能,比C#慢200倍 [英] C++ Perfomance Per Compiler, 200 times slower than C#
问题描述
我正在处理一些我在此问题中讨论过的性能问题: Super Slow C ++ For Loop
I was dealing with some performance issues which I discussed in this question: Super Slow C++ For Loop
我有一个简单的程序可以解析二进制数据.我在两台计算机上进行了本地测试.
I have a simple program I wrote to parse binary data. I tested it locally on 2 computers.
1. Dual 6 core 2.4GHz Xeon V3, 64GB RAM, NVMe SSD
2. Dual 4 core 3.5GHz Xeon V3, 64GB RAM, NVMe SSD
这里是一些代码(其余内容在Wandbox上 https://wandbox.org/permlink/VIvardJNAMKzSbMf ):
Here is some of the code(rest is on Wandbox https://wandbox.org/permlink/VIvardJNAMKzSbMf):
string HexRow="";
for (int i=b; i<HexLineLength+b;i++){
HexRow+= incomingData[i];
}
std::vector<unsigned char> BufferedLine=HexToBytes(HexRow);
stopwatch<> sw;
for (int i = 0; 80 >= i; ++i)
{
Byte ColumnBytes;
for (auto it = columns["data"][i].begin(); it != columns["data"][i].end(); ++it)
{
try {
if (it.key() == "Column") { ColumnBytes.Column = it.value().get<std::string>();}
else if (it.key() == "DataType") { ColumnBytes.DataType = it.value().get<std::string>();}
else if (it.key() == "StartingPosition") { ColumnBytes.StartingPosition = it.value().get<int>();}
else if (it.key() == "ColumnWidth") { ColumnBytes.ColumnWidth = it.value().get<int>();}
}
catch (...) {}
}
char* locale = setlocale(LC_ALL, "UTF-8");
std::vector<unsigned char> CurrentColumnBytes(ColumnBytes.ColumnWidth);
int arraySize = CurrentColumnBytes.size();
for (int C = ColumnBytes.StartingPosition; C < ColumnBytes.ColumnWidth + ColumnBytes.StartingPosition; ++C)
{
int Index = C - ColumnBytes.StartingPosition;
CurrentColumnBytes[Index] = BufferedLine[C-1];
}
}
std::cout << "Elapsed: " << duration_cast<double>(sw.elapsed()) << '\n';
PC 1
使用以下标志在带有Visual Studio的PC 1上进行编译:
PC 1
Compiling on PC 1 with Visual Studio using the following flags:
/O2 /JMC /permissive- /MP /GS /analyze- /W3 /Zc:wchar_t /ZI /Gm- /sdl /Zc:inline /fp:precise /D "_CRT_SECURE_NO_WARNINGS" /D "_MBCS" /errorReport:prompt /WX- /Zc:forScope /Gd /Oy- /MDd /std:c++17 /FC /Fa"Debug\" /EHsc /nologo /Fo"Debug\" /Fp"Debug\Project1.pch" /diagnostics:column
输出:
Elapsed: 0.0913771
Elapsed: 0.0419886
Elapsed: 0.042406
在Clang中使用以下内容:clang main.cpp -O3
输出:
Using Clang with the following: clang main.cpp -O3
outputs:
Elapsed: 0.036262
Elapsed: 0.0174264
Elapsed: 0.0170038
使用这些开关gcc main.cpp -lstdc++ -O3
从MinGW gcc version 8.1.0 (i686-posix-dwarf-rev0, Built by MinGW-W64 project)
中使用GCC进行编译会给出以下时间:
Compiling with GCC from MinGW gcc version 8.1.0 (i686-posix-dwarf-rev0, Built by MinGW-W64 project)
using these switches gcc main.cpp -lstdc++ -O3
gives the following time:
Elapsed: 0.019841
Elapsed: 0.0099643
Elapsed: 0.0094552
PC 2
我使用Visual Studio,但仍使用/O2
PC 2
I get with Visual Studio, still with the /O2
Elapsed: 0.054841
Elapsed: 0.03543
Elapsed: 0.034552
我没有在PC 2上做Clang和GCC,但是改进还不足以解决我的问题.
I didn't do Clang and GCC on PC 2, but the improvement wasn't significant enough to resolve my concerns.
问题在于Wandbox上的代码完全相同( https://wandbox.org/permlink/VIvardJNAMKzSbMf )执行速度提高10-80倍
The issue is that the exact same code on Wandbox (https://wandbox.org/permlink/VIvardJNAMKzSbMf) executes 10-80 times faster
Elapsed: 0.00115457
Elapsed: 0.000815412
Elapsed: 0.000814636
Wandbox正在使用GCC 10.0.0和c ++ 14.我意识到它可能在Linux上运行,并且我找不到任何方法可以让GCC 10在Windows上编译,所以我无法测试使用该版本的编译.
Wandbox is using GCC 10.0.0 and c++14. I realize it is likely running on linux, and I couldn't find any way to get GCC 10 to compile on Windows, so I can't test compiling with that version.
这是对我编写的C#应用程序的重写,其运行速度如此之快:
This is a rewrite of a C# application I wrote, which operates so much faster:
Elapsed: 0.017424
Elapsed: 0.0006065
Elapsed: 0.000733
Elapsed: 0.0006166
Elapsed: 0.0004699
Finished Parsing: 100 Records. Elapsed :0.0082796 at a rate of : 12076/s
C#方法如下:
Stopwatch sw = new Stopwatch();
sw.Start();
foreach (dynamic item in TableData.data) //TableData is a JSON file with the structure definition
{
string DataType = item.DataType;
int startingPosition = item.StartingPosition;
int width = Convert.ToInt32(item.ColumnWidth);
if (width+startingPosition >= FullLineLength)
{
continue;
}
byte[] currentColumnBytes = currentLineBytes.Skip(startingPosition).Take(width).ToArray();
// ..... 200 extra lines of processing into ints, dates, strings ......
// ..... Even with the extra work, it operates at 1200+ records per second ......
}
sw.Stop();
var seconds = sw.Elapsed.TotalSeconds;
sw.Reset();
Console.WriteLine("Elapsed: " + seconds);
TempTable.Rows.Add(dataRow);
当我开始这样做时,我期望通过将代码从C#迁移到非托管C ++来获得巨大的性能提升.这是我的第一个C ++项目,坦率地说,我对自己的位置有些灰心.如何加快C ++的速度?我是否需要使用更多或更少结构的不同数据类型malloc
?
When I started this, I expected huge performance gains by moving code to unmanaged C++ from C#. This is my first C++ project and I am frankly just a bit discouraged about where I am. What can be done to speed up this C++? Do I need to use different datatypes, malloc
, more / less structs?
它需要在Windows上运行,不确定是否有办法让GCC 10在Windows上运行?
It needs to run on Windows, not sure if there is a way to get GCC 10 to work on Windows?
您对有抱负的C ++开发人员有什么建议?
What suggestions do you have for an aspiring C++ Developer?
推荐答案
好,所以我能够C ++以每秒约50,000行的速度处理文件,每行80列.我重新设计了整个工作流程,以确保完全不必回溯.我首先将整个文件读入ByteArray
,然后通过将数据从一个数组移到另一个数组而不是在for
循环中指定每个字节来逐行遍历.然后,我使用map
来存储数据.
Ok, so I was able to get C++ processing the file at around 50,000 rows per second with 80 columns per row. I reworked the entire workflow to make sure it didn't have to backtrack at all. I first read the entire file into ByteArray
and then would go over it line by line by moving data from one array to another rather than specifying each byte in a for
loop. I then used a map
to store the data.
stopwatch<> sw;
while (CurrentLine < TotalLines)
{
int BufferOffset = CurrentLine * LineLength;
std::move(ByteArray + BufferOffset, ByteArray + BufferOffset + LineLength, LineByteArray);
for (int i = 0; TotalColumns > i + 1; ++i)
{
int ThisStartingPosition = StartingPosition[i];
int ThisWidth = ColumnWidths[i];
std::uint8_t* CurrentColumnBytes;
CurrentColumnBytes = new uint8_t[ThisWidth];
{
std::move(LineByteArray + ThisStartingPosition, LineByteArray + ThisStartingPosition + ThisWidth, CurrentColumnBytes);
ResultMap[CurrentLine][i] = Format(CurrentColumnBytes, ThisWidth, DataType[i]);
}
}
CurrentLine++;
}
std::cout << "Processed" << CurrentLine << " lines in : " << duration_cast<double>(sw.elapsed()) << '\n';
我仍然有些失望,因为使用Clang编译无法使用Boost Gregorian日历转换,而使用标准MS编译器会使速度慢近20倍.使用Clang -O3
,它在0.25秒内处理了10,700条记录,包括所有int
和string
转换.我只需要编写自己的date
转换.
I still am a little disappointed because using the Boost Gregorian calendar conversion is unavailable using Clang to compile, and using the standard MS compiler makes it nearly 20X slower. With Clang -O3
it was processing 10,700 records in 0.25 seconds including all int
and string
conversions. I will just have to write my own date
conversion.
这篇关于每个编译器的C ++性能,比C#慢200倍的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!