首字母缩略词SSO在std :: string的上下文中的含义 [英] Meaning of acronym SSO in the context of std::string

查看:155
本文介绍了首字母缩略词SSO在std :: string的上下文中的含义的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有关优化和代码样式,在优化 std :: string 的副本的上下文中引用了SSO的几个答案。在这种情况下,SSO是什么意思?



显然不是单点登录。 > 背景/概述

>对自动变量的操作(从堆栈,它是您在不调用 malloc / new 的情况下创建的变量)通常比涉及自由存储(堆,其是使用 new 创建的变量)的那些快得多。但是,自动数组的大小在编译时是固定的,但是从自由存储的数组的大小不是。此外,堆栈大小有限(通常为几个MiB),而自由存储只受您系统内存的限制。



SSO是短/小字符串优化。 std :: string 通常将字符串存储为指向自由存储(堆)的指针,这提供了类似的性能特性,就像调用 new char [size] 。这防止了非常大的字符串的堆栈溢出,但它可能更慢,特别是对于复制操作。作为优化, std :: string 的许多实现创建一个小型自动数组,例如 char [20] 。如果您有一个20个字符或更小的字符串(在此示例中,实际大小不同),它将它直接存储在该数组中。这避免了需要调用 new ,这加快了一些。



编辑:



我没有想到这个答案是如此受欢迎,但因为它是,让我给一个更现实的实现,注意,我从来没有实际读过任何实现



实施细节



至少, std :: string 需要存储以下信息:




  • 大小

  • 容量

  • 资料的位置



作为 std :: string :: size_type 或作为结束的指针。唯一的区别是当用户调用 size 时,是否需要减去两个指针,或者将 size_type 添加到当用户调用 end 时的指针。



您不支付您不使用的费用。



首先,考虑基于上面概述的天真的实现:

  class string {
public:
//所有83个成员函数
private:
std :: unique_ptr< char []> m_data;
size_type m_size;
size_type m_capacity;
std :: array< char,16> m_sso;
};

对于64位系统,通常意味着 std :: string 每个字符串有24字节的开销,再加上16个SSO缓冲区(由于填充要求,这里选择了16个而不是20个)。将这三个数据成员加上一个局部字符数组存储起来是没有意义的,就像我的简化示例。如果 m_size <= 16 ,那么我将所有数据放在 m_sso 中,因此我已经知道容量并且我不需要指向数据的指针。如果 m_size> 16 ,那么我不需要 m_sso 。在我需要他们所有的时候,绝对没有重叠。没有空间的更聪明的解决方案看起来更像这样(未测试,仅作为示例):

  class string { 
public:
//所有83个成员函数
private:
size_type m_size;
union {
class {
//这可能更好地设计为类似数组的类
std :: unique_ptr< char []> m_data;
size_type m_capacity;
} m_large;
std :: array< char,sizeof(m_large)> m_small;
};
};

我假设大多数实现看起来更像这样。


In a C++ question about optimization and code style, several answers referred to "SSO" in the context of optimizing copies of std::string. What does SSO mean in that context?

Clearly not "single sign on". "Shared string optimization", perhaps?

解决方案

Background / Overview

Operations on automatic variables ("from the stack", which are variables that you create without calling malloc / new) are generally much faster than those involving the free store ("the heap", which are variables that are created using new). However, the size of automatic arrays is fixed at compile time, but the size of arrays from the free store is not. Moreover, the stack size is limited (typically a few MiB), whereas the free store is only limited by your system's memory.

SSO is the Short / Small String Optimization. A std::string typically stores the string as a pointer to the free store ("the heap"), which gives similar performance characteristics as if you were to call new char [size]. This prevents a stack overflow for very large strings, but it can be slower, especially with copy operations. As an optimization, many implementations of std::string create a small automatic array, something like char [20]. If you have a string that is 20 characters or smaller (given this example, the actual size varies), it stores it directly in that array. This avoids the need to call new at all, which speeds things up a bit.

EDIT:

I wasn't expecting this answer to be quite so popular, but since it is, let me give a more realistic implementation, with the caveat that I've never actually read any implementation of SSO "in the wild".

Implementation details

At the minimum, a std::string needs to store the following information:

  • The size
  • The capacity
  • The location of the data

The size could be stored as a std::string::size_type or as a pointer to the end. The only difference is whether you want to have to subtract two pointers when the user calls size or add a size_type to a pointer when the user calls end. The capacity can be stored either way as well.

You don't pay for what you don't use.

First, consider the naive implementation based on what I outlined above:

class string {
public:
    // all 83 member functions
private:
    std::unique_ptr<char[]> m_data;
    size_type m_size;
    size_type m_capacity;
    std::array<char, 16> m_sso;
};

For a 64-bit system, that generally means that std::string has 24 bytes of 'overhead' per string, plus another 16 for the SSO buffer (16 chosen here instead of 20 due to padding requirements). It wouldn't really make sense to store those three data members plus a local array of characters, as in my simplified example. If m_size <= 16, then I will put all of the data in m_sso, so I already know the capacity and I don't need the pointer to the data. If m_size > 16, then I don't need m_sso. There is absolutely no overlap where I need all of them. A smarter solution that wastes no space would look something a little more like this (untested, example purposes only):

class string {
public:
    // all 83 member functions
private:
    size_type m_size;
    union {
        class {
            // This is probably better designed as an array-like class
            std::unique_ptr<char[]> m_data;
            size_type m_capacity;
        } m_large;
        std::array<char, sizeof(m_large)> m_small;
    };
};

I'd assume that most implementations look more like this.

这篇关于首字母缩略词SSO在std :: string的上下文中的含义的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆