std::string 上下文中首字母缩略词 SSO 的含义 [英] Meaning of acronym SSO in the context of std::string

查看:24
本文介绍了std::string 上下文中首字母缩略词 SSO 的含义的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

关于优化和的 C++ 问题中代码风格,几个答案在优化std::string 副本的上下文中提到了SSO".在这种情况下,SSO 是什么意思?

In a C++ question about optimization and code style, several answers referred to "SSO" in the context of optimizing copies of std::string. What does SSO mean in that context?

显然不是单点登录".共享字符串优化",也许?

Clearly not "single sign on". "Shared string optimization", perhaps?

推荐答案

背景/概述

对自动变量的操作(来自堆栈",即您在不调用 malloc/new 的情况下创建的变量)通常比涉及自由存储的操作快得多(堆",它们是使用 new 创建的变量).但是,自动数组的大小在编译时是固定的,而自由存储中的数组大小则不是.此外,堆栈大小是有限的(通常为几 MiB),而空闲存储仅受系统内存的限制.

Background / Overview

Operations on automatic variables ("from the stack", which are variables that you create without calling malloc / new) are generally much faster than those involving the free store ("the heap", which are variables that are created using new). However, the size of automatic arrays is fixed at compile time, but the size of arrays from the free store is not. Moreover, the stack size is limited (typically a few MiB), whereas the free store is only limited by your system's memory.

SSO 是短/小字符串优化.std::string 通常将字符串存储为指向空闲存储(堆")的指针,这提供了与调用 new char [size]<类似的性能特征/代码>.这可以防止非常大的字符串的堆栈溢出,但它可能会更慢,尤其是对于复制操作.作为一种优化,std::string 的许多实现创建了一个小的自动数组,类似于 char [20].如果您有一个小于等于 20 个字符的字符串(在本示例中,实际大小会有所不同),它会将其直接存储在该数组中.这完全避免了调用 new 的需要,从而加快了速度.

SSO is the Short / Small String Optimization. A std::string typically stores the string as a pointer to the free store ("the heap"), which gives similar performance characteristics as if you were to call new char [size]. This prevents a stack overflow for very large strings, but it can be slower, especially with copy operations. As an optimization, many implementations of std::string create a small automatic array, something like char [20]. If you have a string that is 20 characters or smaller (given this example, the actual size varies), it stores it directly in that array. This avoids the need to call new at all, which speeds things up a bit.

我没想到这个答案会如此受欢迎,但既然如此,让我提供一个更现实的实现,但需要注意的是,我从未真正阅读过任何野外"SSO 实现.

I wasn't expecting this answer to be quite so popular, but since it is, let me give a more realistic implementation, with the caveat that I've never actually read any implementation of SSO "in the wild".

一个std::string至少需要存储以下信息:

At the minimum, a std::string needs to store the following information:

  • 尺寸
  • 容量
  • 数据的位置

大小可以存储为 std::string::size_type 或指向末尾的指针.唯一的区别是您是否希望在用户调用 size 时减去两个指针,或者在用户调用 end 时为指针添加一个 size_type>.容量也可以任意存储.

The size could be stored as a std::string::size_type or as a pointer to the end. The only difference is whether you want to have to subtract two pointers when the user calls size or add a size_type to a pointer when the user calls end. The capacity can be stored either way as well.

首先,考虑基于我上面概述的简单实现:

First, consider the naive implementation based on what I outlined above:

class string {
public:
    // all 83 member functions
private:
    std::unique_ptr<char[]> m_data;
    size_type m_size;
    size_type m_capacity;
    std::array<char, 16> m_sso;
};

对于 64 位系统,这通常意味着 std::string 每个字符串有 24 个字节的开销",另外还有 16 个用于 SSO 缓冲区(此处选择 16 个而不是 20到填充要求).像在我的简化示例中一样,存储这三个数据成员和一个本地字符数组并没有什么意义.如果m_size <= 16,那么我会把所有的数据都放在m_sso中,这样我就已经知道容量了,不需要指向数据的指针.如果 m_size >16,那我就不需要m_sso了.在我需要所有这些的地方绝对没有重叠.一个不浪费空间的更智能的解决方案看起来更像这样(未经测试,仅用于示例目的):

For a 64-bit system, that generally means that std::string has 24 bytes of 'overhead' per string, plus another 16 for the SSO buffer (16 chosen here instead of 20 due to padding requirements). It wouldn't really make sense to store those three data members plus a local array of characters, as in my simplified example. If m_size <= 16, then I will put all of the data in m_sso, so I already know the capacity and I don't need the pointer to the data. If m_size > 16, then I don't need m_sso. There is absolutely no overlap where I need all of them. A smarter solution that wastes no space would look something a little more like this (untested, example purposes only):

class string {
public:
    // all 83 member functions
private:
    size_type m_size;
    union {
        class {
            // This is probably better designed as an array-like class
            std::unique_ptr<char[]> m_data;
            size_type m_capacity;
        } m_large;
        std::array<char, sizeof(m_large)> m_small;
    };
};

我认为大多数实现看起来更像这样.

I'd assume that most implementations look more like this.

这篇关于std::string 上下文中首字母缩略词 SSO 的含义的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆