GCC中的std :: string实现及其短字符串的内存开销 [英] std::string implementation in GCC and its memory overhead for short strings
问题描述
我目前正在为低内存平台开发应用程序,该平台需要许多短字符串(> 100,000个字符串,每个字符串包含4-16个字符)的std :: set.我最近将此集合从std :: string转换为const char *以节省内存,我想知道我是否真的避免了每个字符串那么多的开销.
I am currently working on an application for a low-memory platform that requires an std::set of many short strings (>100,000 strings of 4-16 characters each). I recently transitioned this set from std::string to const char * to save memory and I was wondering whether I was really avoiding all that much overhead per string.
我尝试使用以下内容:
std::string sizeTest = "testString";
std::cout << sizeof(sizeTest) << " bytes";
但是它只给了我4个字节的输出,表明该字符串包含一个指针.我很清楚字符串在内部将其数据存储在char *中,但是我认为字符串类会产生额外的开销.
But it just gave me an output of 4 bytes, indicating that the string contains a pointer. I'm well aware that strings store their data in a char * internally, but I thought the string class would have additional overhead.
GCC实现的std :: string是否会产生比sizeof(std :: string)所指示的更多开销?更重要的是,在这种数据集大小上它是否有意义?
Does the GCC implementation of std::string incur more overhead than sizeof(std::string) would indicate? More importantly, is it significant over this size of data set?
以下是我平台上相关类型的大小(它是32位,每字节有8位):
Here are the sizes of relevant types on my platform (it is 32-bit and has 8 bits per byte):
字符:1个字节
无效*:4个字节
字符*:4个字节
std :: string:4个字节
char: 1 bytes
void *: 4 bytes
char *: 4 bytes
std::string: 4 bytes
推荐答案
至少在GCC 4.4.5中,这是我方便的
在计算机上,std::string
是std::basic_string<char>
的typdef,并且
basic_string
在以下位置定义
/usr/include/c++/4.4.5/bits/basic_string.h
.有很多
间接在该文件中,但归结为该非空
std::string
s存储指向其中之一的指针:
Well, at least with GCC 4.4.5, which is what I have handy on this
machine, std::string
is a typdef for std::basic_string<char>
, and
basic_string
is defined in
/usr/include/c++/4.4.5/bits/basic_string.h
. There's a lot of
indirection in that file, but what it comes down to is that nonempty
std::string
s store a pointer to one of these:
struct _Rep_base
{
size_type _M_length;
size_type _M_capacity;
_Atomic_word _M_refcount;
};
紧随其后的是内存中的实际字符串数据.所以std::string
是
每个字符串至少要有3个单词的开销,再加上
capacity
比`length
高的任何开销(可能
不,取决于您构造字符串的方式-您可以通过以下方式检查
询问capacity()
方法.
Followed in-memory by the actual string data. So std::string
is
going to have at least three words of overhead for each string, plus
any overhead for having a higher capacity
than `length
(probably
not, depending on how you construct your strings -- you can check by
asking the capacity()
method).
您的内存分配器还会有额外的开销
大量的小额拨款;我不知道GCC在C ++中使用了什么,但是
假设它类似于C中使用的dlmalloc
分配器,
每个分配至少可以包含两个单词,外加一些对齐空间
大小至少为8个字节的倍数.
There's also going to be overhead from your memory allocator for doing
lots of small allocations; I don't know what GCC uses for C++, but
assuming it's similar to the dlmalloc
allocator it uses for C, that
could be at least two words per allocation, plus some space to align
the size to a multiple of at least 8 bytes.
这篇关于GCC中的std :: string实现及其短字符串的内存开销的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!