Char * vs String速度在C ++ [英] Char* vs String Speed in C++

查看:188
本文介绍了Char * vs String速度在C ++的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个C ++程序,将从二进制文件读取数据,最初我存储的数据在 std :: vector< char *> data 。我改变了我的代码,所以我现在使用字符串而不是char *,所以 std :: vector< std :: string> data 。我必须做的一些更改是从 strcmp 更改为比较例如。



但是我看到我的执行时间急剧增加。对于示例文件,当我使用char *它花了0.38秒,转换为字符串后,它在我的Linux机器上花了1.72秒。我在Windows机器上观察到类似的问题,执行时间从0.59s增加到1.05s。



我相信这个功能会导致减速。它是转换器类的一部分,请注意变量名末尾的 _ 指定的私有变量。我显然有这里的内存问题,卡在C和C ++代码之间。



我访问 ids _ names _ 多次在另一个函数中,所以访问速度非常重要。 通过使用创建映射而不是两个单独的向量,我已经能够实现更快的速度与更稳定的C ++代码。感谢大家!



示例NewList.Txt



  2515 ABC 23.5 32 -99 1875.7 1 
1676 XYZ 12.5 31 -97 530.82 2
279 FOO 45.5 31 -96 530.8 3



OLD代码:



  void converter :: updateNewList(){
FILE * NewList;
char lineBuffer [100];
char * id = 0;
char * name = 0;

int l = 0;
int n;

NewList = fopen(NewList.txt,r);
if(NewList == NULL){
std :: cerr<< 读取NewList.txt\\\
时出错;
exit(EXIT_FAILURE);
}

while(!feof(NewList)){
fgets(lineBuffer,100,NewList); //读取行
l = 0;
while(!isspace(lineBuffer [1])){
l = l + 1;
}

id = new char [l];
switch(l){
case 1:
n = sprintf(id,%c,lineBuffer [0]);
break;
case 2:
n = sprintf(id,%c%c,lineBuffer [0],lineBuffer [1]);
break;
case 3:
n = sprintf(id,%c%c%c,lineBuffer [0],lineBuffer [1],lineBuffer [2]
break;
case 4:
n = sprintf(id,%c%c%c%c,lineBuffer [0],lineBuffer [1],lineBuffer [2],lineBuffer [3]);
break;
默认值:
n = -1;
break;
}
if(n <0){
std :: cerr<< 从NewList.txt \\\
处理ids错误;
exit(EXIT_FAILURE);
}

l = l + 1;
int s = l;
while(!isspace(lineBuffer [1])){
l = l + 1;
}
name = new char [l-s];
switch(l-s){
case 2:
n = sprintf(name,%c%c,lineBuffer [s + 0],lineBuffer [s + 1]);
break;
case 3:
n = sprintf(name,%c%c%c,lineBuffer [s + 0],lineBuffer [s + 1],lineBuffer [s + 2]
break;
case 4:
n = sprintf(name,%c%c%c%c,lineBuffer [s + 0],lineBuffer [s + 1],lineBuffer [s + 2],lineBuffer [ s + 3]);
break;
默认值:
n = -1;
break;
}
if(n <0){
std :: cerr<< 处理短名称从NewList.txt \\\
错误;
exit(EXIT_FAILURE);
}


ids_.push_back(std :: string(id));
names_.push_back(std :: string(name));
}

bool isFound = false;
for(unsigned int i = 0; i< siteNames_.size(); i ++){
isFound = false;
for(unsigned int j = 0; j< names_.size(); j ++){
if(siteNames_ [i] .compare(names_ [j])== 0){
isFound = true;
}
}
}

fclose(NewList);
delete [] id;
delete [] name;
}



C ++ CODE



  void converter :: updateNewList(){
std :: ifstream NewList(NewList.txt);

while(NewList.good()){
unsigned int id(0);
std :: string name;

//获取ID和名称
NewList>> id>>名称;

//忽略行的其余部分
NewList.ignore(std :: numeric_limits< std :: streamsize> :: max(),'\\\
');

info_.insert(std :: pair< std :: string,unsigned int>(name,id));

}

NewList.close()
}

更新:后续问题:比较字符串的瓶颈,感谢非常有用的帮助!我将来不会犯这些错误!

解决方案

我猜想它应该绑定到向量< string>

A com / reference / stl / vector /rel =nofollow> std :: vector 使用内部连续数组, ,它需要创建另一个更大的数组,并逐个复制字符串,这意味着一个复制构造和销毁的字符串具有相同的内容,这是反生产... ...



要确认这一点很容易,然后使用 std :: vector< std :: string *> ,看看是否有性能差异



如果是这种情况,您可以执行以下四个操作之一:


  1. 如果你知道(或有一个好主意)向量的最终大小,使用它的方法 reserve() 在内部数组中预留足够的空间,以避免无用的重新分配。

  2. 使用 std :: deque ,它几乎像一个向量。

  3. 使用 std :: list (不允许随机存取其项目)

  4. 使用std :: vector< ; char *>



关于字符串



:我假设你的字符串\char *被创建一次,并且不被修改(通过一个realloc,一个append等)。



如果上面的想法不够,那么...



字符串对象的内部缓冲区的分配类似于 char *



现在,如果你的 char * 是真的 char [SOME_CONSTANT_SIZE] ,那么你避免malloc(因此,会比std :: string更快)。



编辑



读取更新的代码后,我看到以下问题。


  1. 如果ids_和nam​​es_是向量,并且如果你有一点想法的行数,那么你应该使用 reserve()

  2. faaNames_应该是std :: map,甚至是std :: map :: unordered_map(或者你在你的编译器上有什么hash_map)。您的搜索目前是两个for循环,这是非常昂贵和低效的。

  3. 在比较字符串的内容之前,请考虑比较字符串的长度。在C ++中,字符串的长度(即std :: string :: length())是一个零开销操作)

  4. 现在,我不知道你在做什么isFound变量,但如果你只需要找到一个真正的平等,那么我想你应该工作的算法(我不知道是否已经有一个,请参见 http://www.cplusplus.com/reference/algorithm/ ),但我相信这种搜索可以做的更有效的只是通过思考。

其他意见:


  1. 忘记对STL中的大小和长度使用 int 。至少,使用 size_t 。在64位中,size_t将变为64位,而int将保持32位,因此您的代码不是64位准备好(另一方面,我看到几个传入8 Go字符串的情况...但仍然,更正确...)



编辑2



所谓的C和C ++)代码是不同的。 C代码期望id和长度小于5的名称,或者程序存在错误。 C ++代码没有这样的限制。不过,如果您确认名称和ids总是小于5个字符,则此限制是大规模优化的基础。


I have a C++ program that will read in data from a binary file and originally I stored data in std::vector<char*> data. I have changed my code so that I am now using strings instead of char*, so that std::vector<std::string> data. Some changes I had to make was to change from strcmp to compare for example.

However I have seen my execution time dramatically increase. For a sample file, when I used char* it took 0.38s and after the conversion to string it took 1.72s on my Linux machine. I observed a similar problem on my Windows machine with execution time increasing from 0.59s to 1.05s.

I believe this function is causing the slow down. It is part of the converter class, note private variables designated with_ at the end of variable name. I clearly am having memory problems here and stuck in between C and C++ code. I want this to be C++ code, so I updated the code at the bottom.

I access ids_ and names_ many times in another function too, so access speed is very important. Through the use of creating a map instead of two separate vectors, I have been able to achieve faster speeds with more stable C++ code. Thanks to everyone!

Example NewList.Txt

2515    ABC 23.5    32  -99 1875.7  1  
1676    XYZ 12.5    31  -97 530.82  2  
279  FOO 45.5    31  -96  530.8  3  

OLD Code:

void converter::updateNewList(){
    FILE* NewList;
    char lineBuffer[100];
    char* id = 0;
    char* name = 0;

    int l = 0;
    int n;

    NewList = fopen("NewList.txt","r");
    if (NewList == NULL){
        std::cerr << "Error in reading NewList.txt\n";
        exit(EXIT_FAILURE);
    } 

    while(!feof(NewList)){
        fgets (lineBuffer , 100 , NewList); // Read line    
        l = 0;
        while (!isspace(lineBuffer[l])){
            l = l + 1;
        }

        id = new char[l];
        switch (l){
            case 1: 
                n = sprintf (id, "%c", lineBuffer[0]);
                break;
            case 2:
                n = sprintf (id, "%c%c", lineBuffer[0], lineBuffer[1]);
                break;
            case 3:
                n = sprintf (id, "%c%c%c", lineBuffer[0], lineBuffer[1], lineBuffer[2]);        
                break;
            case 4:
                n = sprintf (id, "%c%c%c%c", lineBuffer[0], lineBuffer[1], lineBuffer[2],lineBuffer[3]);
                break;
            default:
                n = -1;
                break;
        }
        if (n < 0){
            std::cerr << "Error in processing ids from NewList.txt\n";
            exit(EXIT_FAILURE);
        }

        l = l + 1;
        int s = l;
        while (!isspace(lineBuffer[l])){
            l = l + 1;
        }
        name = new char[l-s];
        switch (l-s){
            case 2:
                n = sprintf (name, "%c%c", lineBuffer[s+0], lineBuffer[s+1]);
                break;
            case 3:
                n = sprintf (name, "%c%c%c", lineBuffer[s+0], lineBuffer[s+1], lineBuffer[s+2]);
                break;
            case 4:
                n = sprintf (name, "%c%c%c%c", lineBuffer[s+0], lineBuffer[s+1], lineBuffer[s+2],lineBuffer[s+3]);
                break;
            default:
                n = -1;
                break;
        }
        if (n < 0){
            std::cerr << "Error in processing short name from NewList.txt\n";
            exit(EXIT_FAILURE);
        }


        ids_.push_back ( std::string(id) );
        names_.push_back(std::string(name));
    }

    bool isFound = false;
    for (unsigned int i = 0; i < siteNames_.size(); i ++) {
        isFound = false;
        for (unsigned int j = 0; j < names_.size(); j ++) {
            if (siteNames_[i].compare(names_[j]) == 0){
                isFound = true;
            }
        }
    }

    fclose(NewList);
    delete [] id;
    delete [] name;
}

C++ CODE

void converter::updateNewList(){
    std::ifstream NewList ("NewList.txt");

    while(NewList.good()){
        unsigned int id (0);
        std::string name;

        // get the ID and name
        NewList >> id >> name;

        // ignore the rest of the line
        NewList.ignore( std::numeric_limits<std::streamsize>::max(), '\n');

        info_.insert(std::pair<std::string, unsigned int>(name,id));

    }

    NewList.close();
}

UPDATE: Follow up question: Bottleneck from comparing strings and thanks for the very useful help! I will not be making these mistakes in the future!

解决方案

My guess it that it should be tied to the vector<string>'s performance

About the vector

A std::vector works with an internal contiguous array, meaning that once the array is full, it needs to create another, larger array, and copy the strings one by one, which means a copy-construction and a destruction of string which had the same contents, which is counter-productive...

To confirm this easily, then use a std::vector<std::string *> and see if there is a difference in performance.

If this is the case, they you can do one of those four things:

  1. if you know (or have a good idea) of the final size of the vector, use its method reserve() to reserve enough space in the internal array, to avoid useless reallocations.
  2. use a std::deque, which works almost like a vector
  3. use a std::list (which doesn't give you random access to its items)
  4. use the std::vector<char *>

About the string

Note: I'm assuming that your strings\char * are created once, and not modified (through a realloc, an append, etc.).

If the ideas above are not enough, then...

The allocation of the string object's internal buffer is similar to a malloc of a char *, so you should see little or no differences between the two.

Now, if your char * are in truth char[SOME_CONSTANT_SIZE], then you avoid the malloc (and thus, will go faster than a std::string).

Edit

After reading the updated code, I see the following problems.

  1. if ids_ and names_ are vectors, and if you have the slightest idea of the number of lines, then you should use reserve() on ids_ and and names_
  2. consider making ids_ and names_ deque, or lists.
  3. faaNames_ should be a std::map, or even a std::unordered_map (or whatever hash_map you have on your compiler). Your search currently is two for loops, which is quite costly and inneficient.
  4. Consider comparing the length of the strings before comparing its contents. In C++, the length of a string (i.e. std::string::length()) is a zero cost operation)
  5. Now, I don't know what you're doing with the isFound variable, but if you need to find only ONE true equality, then I guess you should work on the algorithm (I don't know if there is already one, see http://www.cplusplus.com/reference/algorithm/), but I believe this search could be made a lot more efficient just by thinking on it.

Other comments:

  1. Forget the use of int for sizes and lengths in STL. At very least, use size_t. In 64-bit, size_t will become 64-bit, while int will remain 32-bits, so your code is not 64-bit ready (in the other hand, I see few cases of incoming 8 Go strings... but still, better be correct...)

Edit 2

The two (so called C and C++) codes are different. The "C code" expects ids and names of length lesser than 5, or the program exists with an error. The "C++ code" has no such limitation. Still, this limitation is ground for massive optimization, if you confirm names and ids are always less then 5 characters.

这篇关于Char * vs String速度在C ++的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆