GCC中的std :: string实现及其短字符串的内存开销 [英] std::string implementation in GCC and its memory overhead for short strings

查看:138
本文介绍了GCC中的std :: string实现及其短字符串的内存开销的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在为低内存平台开发应用程序,该平台需要许多短字符串(> 100,000个字符串,每个字符串包含4-16个字符)的std :: set.我最近将此集合从std :: string转换为const char *以节省内存,我想知道我是否真的避免了每个字符串那么多的开销.

I am currently working on an application for a low-memory platform that requires an std::set of many short strings (>100,000 strings of 4-16 characters each). I recently transitioned this set from std::string to const char * to save memory and I was wondering whether I was really avoiding all that much overhead per string.

我尝试使用以下内容:

std::string sizeTest = "testString";
std::cout << sizeof(sizeTest) << " bytes";

但是它只给了我4个字节的输出,表明该字符串包含一个指针.我很清楚字符串在内部将其数据存储在char *中,但是我认为字符串类会产生额外的开销.

But it just gave me an output of 4 bytes, indicating that the string contains a pointer. I'm well aware that strings store their data in a char * internally, but I thought the string class would have additional overhead.

GCC实现的std :: string是否会产生比sizeof(std :: string)所指示的更多开销?更重要的是,在这种数据集大小上它是否有意义?

Does the GCC implementation of std::string incur more overhead than sizeof(std::string) would indicate? More importantly, is it significant over this size of data set?

以下是我平台上相关类型的大小(它是32位,每字节有8位):

Here are the sizes of relevant types on my platform (it is 32-bit and has 8 bits per byte):

字符:1个字节
无效*:4个字节
字符*:4个字节
std :: string:4个字节

char: 1 bytes
void *: 4 bytes
char *: 4 bytes
std::string: 4 bytes

推荐答案

至少在GCC 4.4.5中,这是我方便的 在计算机上,std::stringstd::basic_string<char>的typdef,并且 basic_string在以下位置定义 /usr/include/c++/4.4.5/bits/basic_string.h.有很多 间接在该文件中,但归结为该非空 std::string s存储指向其中之一的指针:

Well, at least with GCC 4.4.5, which is what I have handy on this machine, std::string is a typdef for std::basic_string<char>, and basic_string is defined in /usr/include/c++/4.4.5/bits/basic_string.h. There's a lot of indirection in that file, but what it comes down to is that nonempty std::strings store a pointer to one of these:

  struct _Rep_base
  {
size_type       _M_length;
size_type       _M_capacity;
_Atomic_word        _M_refcount;
  };

紧随其后的是内存中的实际字符串数据.所以std::string是 每个字符串至少要有3个单词的开销,再加上 capacity`length高的任何开销(可能 不,取决于您构造字符串的方式-您可以通过以下方式检查 询问capacity()方法.

Followed in-memory by the actual string data. So std::string is going to have at least three words of overhead for each string, plus any overhead for having a higher capacity than `length (probably not, depending on how you construct your strings -- you can check by asking the capacity() method).

您的内存分配器还会有额外的开销 大量的小额拨款;我不知道GCC在C ++中使用了什么,但是 假设它类似于C中使用的dlmalloc分配器, 每个分配至少可以包含两个单词,外加一些对齐空间 大小至少为8个字节的倍数.

There's also going to be overhead from your memory allocator for doing lots of small allocations; I don't know what GCC uses for C++, but assuming it's similar to the dlmalloc allocator it uses for C, that could be at least two words per allocation, plus some space to align the size to a multiple of at least 8 bytes.

这篇关于GCC中的std :: string实现及其短字符串的内存开销的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆