Char * vs String速度在C ++ [英] Char* vs String Speed in C++
问题描述
我有一个C ++程序,将从二进制文件读取数据,最初我存储的数据在 std :: vector< char *> data
。我改变了我的代码,所以我现在使用字符串而不是char *,所以 std :: vector< std :: string> data
。我必须做的一些更改是从 strcmp
更改为比较
例如。
但是我看到我的执行时间急剧增加。对于示例文件,当我使用char *它花了0.38秒,转换为字符串后,它在我的Linux机器上花了1.72秒。我在Windows机器上观察到类似的问题,执行时间从0.59s增加到1.05s。
我相信这个功能会导致减速。它是转换器类的一部分,请注意变量名末尾的 _
指定的私有变量。我显然有这里的内存问题,卡在C和C ++代码之间。
我访问 ids _
和 names _
多次在另一个函数中,所以访问速度非常重要。 通过使用创建映射
而不是两个单独的向量,我已经能够实现更快的速度与更稳定的C ++代码。感谢大家!
示例NewList.Txt
2515 ABC 23.5 32 -99 1875.7 1
1676 XYZ 12.5 31 -97 530.82 2
279 FOO 45.5 31 -96 530.8 3
OLD代码:
void converter :: updateNewList(){
FILE * NewList;
char lineBuffer [100];
char * id = 0;
char * name = 0;
int l = 0;
int n;
NewList = fopen(NewList.txt,r);
if(NewList == NULL){
std :: cerr<< 读取NewList.txt\\\
时出错;
exit(EXIT_FAILURE);
}
while(!feof(NewList)){
fgets(lineBuffer,100,NewList); //读取行
l = 0;
while(!isspace(lineBuffer [1])){
l = l + 1;
}
id = new char [l];
switch(l){
case 1:
n = sprintf(id,%c,lineBuffer [0]);
break;
case 2:
n = sprintf(id,%c%c,lineBuffer [0],lineBuffer [1]);
break;
case 3:
n = sprintf(id,%c%c%c,lineBuffer [0],lineBuffer [1],lineBuffer [2]
break;
case 4:
n = sprintf(id,%c%c%c%c,lineBuffer [0],lineBuffer [1],lineBuffer [2],lineBuffer [3]);
break;
默认值:
n = -1;
break;
}
if(n <0){
std :: cerr<< 从NewList.txt \\\
处理ids错误;
exit(EXIT_FAILURE);
}
l = l + 1;
int s = l;
while(!isspace(lineBuffer [1])){
l = l + 1;
}
name = new char [l-s];
switch(l-s){
case 2:
n = sprintf(name,%c%c,lineBuffer [s + 0],lineBuffer [s + 1]);
break;
case 3:
n = sprintf(name,%c%c%c,lineBuffer [s + 0],lineBuffer [s + 1],lineBuffer [s + 2]
break;
case 4:
n = sprintf(name,%c%c%c%c,lineBuffer [s + 0],lineBuffer [s + 1],lineBuffer [s + 2],lineBuffer [ s + 3]);
break;
默认值:
n = -1;
break;
}
if(n <0){
std :: cerr<< 处理短名称从NewList.txt \\\
错误;
exit(EXIT_FAILURE);
}
ids_.push_back(std :: string(id));
names_.push_back(std :: string(name));
}
bool isFound = false;
for(unsigned int i = 0; i< siteNames_.size(); i ++){
isFound = false;
for(unsigned int j = 0; j< names_.size(); j ++){
if(siteNames_ [i] .compare(names_ [j])== 0){
isFound = true;
}
}
}
fclose(NewList);
delete [] id;
delete [] name;
}
C ++ CODE
void converter :: updateNewList(){
std :: ifstream NewList(NewList.txt);
while(NewList.good()){
unsigned int id(0);
std :: string name;
//获取ID和名称
NewList>> id>>名称;
//忽略行的其余部分
NewList.ignore(std :: numeric_limits< std :: streamsize> :: max(),'\\\
');
info_.insert(std :: pair< std :: string,unsigned int>(name,id));
}
NewList.close()
}
更新:后续问题:比较字符串的瓶颈,感谢非常有用的帮助!我将来不会犯这些错误!
我猜想它应该绑定到向量< string>
A com / reference / stl / vector /rel =nofollow> std :: vector
使用内部连续数组, ,它需要创建另一个更大的数组,并逐个复制字符串,这意味着一个复制构造和销毁的字符串具有相同的内容,这是反生产... ...
要确认这一点很容易,然后使用 std :: vector< std :: string *>
,看看是否有性能差异
如果是这种情况,您可以执行以下四个操作之一:
- 如果你知道(或有一个好主意)向量的最终大小,使用它的方法
reserve()
在内部数组中预留足够的空间,以避免无用的重新分配。 - 使用
std :: deque
,它几乎像一个向量。 - 使用
std :: list
(不允许随机存取其项目) - 使用std :: vector< ; char *>
关于字符串
:我假设你的字符串\char *被创建一次,并且不被修改(通过一个realloc,一个append等)。
如果上面的想法不够,那么...
字符串对象的内部缓冲区的分配类似于 char *
现在,如果你的 char *
是真的 char [SOME_CONSTANT_SIZE]
,那么你避免malloc(因此,会比std :: string更快)。
编辑
读取更新的代码后,我看到以下问题。
- 如果ids_和names_是向量,并且如果你有一点想法的行数,那么你应该使用
reserve()
- faaNames_应该是std :: map,甚至是std :: map :: unordered_map(或者你在你的编译器上有什么hash_map)。您的搜索目前是两个for循环,这是非常昂贵和低效的。
- 在比较字符串的内容之前,请考虑比较字符串的长度。在C ++中,字符串的长度(即std :: string :: length())是一个零开销操作)
- 现在,我不知道你在做什么isFound变量,但如果你只需要找到一个真正的平等,那么我想你应该工作的算法(我不知道是否已经有一个,请参见 http://www.cplusplus.com/reference/algorithm/ ),但我相信这种搜索可以做的更有效的只是通过思考。
其他意见:
- 忘记对STL中的大小和长度使用
int
。至少,使用size_t
。在64位中,size_t将变为64位,而int将保持32位,因此您的代码不是64位准备好(另一方面,我看到几个传入8 Go字符串的情况...但仍然,更正确...)
编辑2
所谓的C和C ++)代码是不同的。 C代码期望id和长度小于5的名称,或者程序存在错误。 C ++代码没有这样的限制。不过,如果您确认名称和ids总是小于5个字符,则此限制是大规模优化的基础。
I have a C++ program that will read in data from a binary file and originally I stored data in std::vector<char*> data
. I have changed my code so that I am now using strings instead of char*, so that std::vector<std::string> data
. Some changes I had to make was to change from strcmp
to compare
for example.
However I have seen my execution time dramatically increase. For a sample file, when I used char* it took 0.38s and after the conversion to string it took 1.72s on my Linux machine. I observed a similar problem on my Windows machine with execution time increasing from 0.59s to 1.05s.
I believe this function is causing the slow down. It is part of the converter class, note private variables designated with_
at the end of variable name. I clearly am having memory problems here and stuck in between C and C++ code. I want this to be C++ code, so I updated the code at the bottom.
I access ids_
and names_
many times in another function too, so access speed is very important. Through the use of creating a map
instead of two separate vectors, I have been able to achieve faster speeds with more stable C++ code. Thanks to everyone!
Example NewList.Txt
2515 ABC 23.5 32 -99 1875.7 1
1676 XYZ 12.5 31 -97 530.82 2
279 FOO 45.5 31 -96 530.8 3
OLD Code:
void converter::updateNewList(){
FILE* NewList;
char lineBuffer[100];
char* id = 0;
char* name = 0;
int l = 0;
int n;
NewList = fopen("NewList.txt","r");
if (NewList == NULL){
std::cerr << "Error in reading NewList.txt\n";
exit(EXIT_FAILURE);
}
while(!feof(NewList)){
fgets (lineBuffer , 100 , NewList); // Read line
l = 0;
while (!isspace(lineBuffer[l])){
l = l + 1;
}
id = new char[l];
switch (l){
case 1:
n = sprintf (id, "%c", lineBuffer[0]);
break;
case 2:
n = sprintf (id, "%c%c", lineBuffer[0], lineBuffer[1]);
break;
case 3:
n = sprintf (id, "%c%c%c", lineBuffer[0], lineBuffer[1], lineBuffer[2]);
break;
case 4:
n = sprintf (id, "%c%c%c%c", lineBuffer[0], lineBuffer[1], lineBuffer[2],lineBuffer[3]);
break;
default:
n = -1;
break;
}
if (n < 0){
std::cerr << "Error in processing ids from NewList.txt\n";
exit(EXIT_FAILURE);
}
l = l + 1;
int s = l;
while (!isspace(lineBuffer[l])){
l = l + 1;
}
name = new char[l-s];
switch (l-s){
case 2:
n = sprintf (name, "%c%c", lineBuffer[s+0], lineBuffer[s+1]);
break;
case 3:
n = sprintf (name, "%c%c%c", lineBuffer[s+0], lineBuffer[s+1], lineBuffer[s+2]);
break;
case 4:
n = sprintf (name, "%c%c%c%c", lineBuffer[s+0], lineBuffer[s+1], lineBuffer[s+2],lineBuffer[s+3]);
break;
default:
n = -1;
break;
}
if (n < 0){
std::cerr << "Error in processing short name from NewList.txt\n";
exit(EXIT_FAILURE);
}
ids_.push_back ( std::string(id) );
names_.push_back(std::string(name));
}
bool isFound = false;
for (unsigned int i = 0; i < siteNames_.size(); i ++) {
isFound = false;
for (unsigned int j = 0; j < names_.size(); j ++) {
if (siteNames_[i].compare(names_[j]) == 0){
isFound = true;
}
}
}
fclose(NewList);
delete [] id;
delete [] name;
}
C++ CODE
void converter::updateNewList(){
std::ifstream NewList ("NewList.txt");
while(NewList.good()){
unsigned int id (0);
std::string name;
// get the ID and name
NewList >> id >> name;
// ignore the rest of the line
NewList.ignore( std::numeric_limits<std::streamsize>::max(), '\n');
info_.insert(std::pair<std::string, unsigned int>(name,id));
}
NewList.close();
}
UPDATE: Follow up question: Bottleneck from comparing strings and thanks for the very useful help! I will not be making these mistakes in the future!
My guess it that it should be tied to the vector<string>'s performance
About the vector
A std::vector
works with an internal contiguous array, meaning that once the array is full, it needs to create another, larger array, and copy the strings one by one, which means a copy-construction and a destruction of string which had the same contents, which is counter-productive...
To confirm this easily, then use a std::vector<std::string *>
and see if there is a difference in performance.
If this is the case, they you can do one of those four things:
- if you know (or have a good idea) of the final size of the vector, use its method
reserve()
to reserve enough space in the internal array, to avoid useless reallocations. - use a
std::deque
, which works almost like a vector - use a
std::list
(which doesn't give you random access to its items) - use the std::vector<char *>
About the string
Note: I'm assuming that your strings\char * are created once, and not modified (through a realloc, an append, etc.).
If the ideas above are not enough, then...
The allocation of the string object's internal buffer is similar to a malloc of a char *
, so you should see little or no differences between the two.
Now, if your char *
are in truth char[SOME_CONSTANT_SIZE]
, then you avoid the malloc (and thus, will go faster than a std::string).
Edit
After reading the updated code, I see the following problems.
- if ids_ and names_ are vectors, and if you have the slightest idea of the number of lines, then you should use
reserve()
on ids_ and and names_ - consider making ids_ and names_ deque, or lists.
- faaNames_ should be a std::map, or even a std::unordered_map (or whatever hash_map you have on your compiler). Your search currently is two for loops, which is quite costly and inneficient.
- Consider comparing the length of the strings before comparing its contents. In C++, the length of a string (i.e. std::string::length()) is a zero cost operation)
- Now, I don't know what you're doing with the isFound variable, but if you need to find only ONE true equality, then I guess you should work on the algorithm (I don't know if there is already one, see http://www.cplusplus.com/reference/algorithm/), but I believe this search could be made a lot more efficient just by thinking on it.
Other comments:
- Forget the use of
int
for sizes and lengths in STL. At very least, usesize_t
. In 64-bit, size_t will become 64-bit, while int will remain 32-bits, so your code is not 64-bit ready (in the other hand, I see few cases of incoming 8 Go strings... but still, better be correct...)
Edit 2
The two (so called C and C++) codes are different. The "C code" expects ids and names of length lesser than 5, or the program exists with an error. The "C++ code" has no such limitation. Still, this limitation is ground for massive optimization, if you confirm names and ids are always less then 5 characters.
这篇关于Char * vs String速度在C ++的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!