使用C样式字符串文字与构造未命名std :: string对象的默认建议? [英] Default advice for using C-style string literals vs. constructing unnamed std::string objects?

查看:102
本文介绍了使用C样式字符串文字与构造未命名std :: string对象的默认建议?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此C ++ 14引入了许多要使用的用户定义文字,其中之一是 s字面后缀,用于创建 std :: string 对象。根据文档,其行为与构造 std :: string 对象完全相同,如下所示:

  auto str = Hello World! s; // RHS等效于:std :: string { Hello World! } 

当然会构造一个未命名的 std :: string 对象可以在C ++ 14之前完成,但是由于C ++ 14的方法非常简单,我认为更多的人实际上会考虑在其上构造 std :: string 对象



所以我的问题很简单:在什么情况下这是件好事(或不好)想法构造一个未命名的 std :: string 对象,而不是简单地使用C样式的字符串文字?






示例1:



请考虑以下内容:

  void foo(std :: string arg); 

foo( bar); //选项1
foo( bar s); //选项2

如果我是正确的,第一个方法将调用 std :: string foo 范围内创建一个对象,第二个方法将首先构造一个未命名的字符串对象,然后从中移动构造 foo 的参数。尽管我确定编译器非常擅长优化此类内容,但是第二个版本似乎比第一个版本需要额外的移动(当然,这样做并不昂贵)。但是,同样,在使用合理的编译器进行编译之后,最终结果很可能会得到高度优化,并且始终没有重复的移动/复制。



此外,如果foo是否过载以接受右值引用?在那种情况下,我认为调用 foo( bar s)是有意义的,但我可能是错的。






示例2:



请考虑以下内容:

  std :: cout<< 你好,世界! << std :: endl; //选项1 
std :: cout<< Hello World!的< std :: endl; //选项2

在这种情况下, std :: string 对象可能通过右值引用传递给 cout 的运算符,第一个选项可能传递了一个指针,因此两者都是很便宜的操作,但是第二个首先要构造一个对象会产生额外的成本。






当然,在所有情况下,构造一个 std :: string 对象可能导致堆分配,可能抛出该堆,因此应考虑异常安全性尽管在第二个示例中,这还是一个问题,因为在第一个示例中,在两种情况下都将构造一个 std :: string 对象无论如何。实际上,从构造字符串对象中获取异常的可能性很小,但在某些情况下仍然可能是有效的参数。



如果您可以考虑考虑更多示例,请将它们包括在您的答案中。我对有关未命名 std :: string 对象的用法的一般建议感兴趣,而不仅仅是这两种特殊情况。我只列出了这些内容,以指出我对该主题的一些想法。



此外,如果我做错了什么,请随时纠正我,因为我绝对不会表示C ++专家。我描述的行为只是我对事情如何运作的猜测,我并没有基于实际的研究或实验。

解决方案

< blockquote>

在什么情况下,构造一个未命名的 std :: string 对象是一个好(或坏)主意,而不是简单地使用C样式字符串字面量?


A std :: string -当您特别想要字面量时,这是个好主意类型为 std :: string 的变量,是否用于



  • 稍后修改值( auto s = 123 s; s + ='\n';



  • 更丰富,直观且不易出错的界面(值语义,迭代器,查找大小等)



    • 值语义表示 == < 复制等操作值,与C字符串文字衰减为 const char * s之后的指针/按引用语义不同



  • 调用 some_templated_function( 123 s)会确保 < std :: string> 实例化,该参数可以在内部使用值语义进行处理



    • 无论如何,代码都会实例化 std :: string 的模板,并且相对于您的资源约束而言,它的复杂性非常高,您可能需要传递 std ::字符串也是为了避免不必要地为 const char * 实例化,但是很少需要护理



  • 包含嵌入式 NUL s


  • $的值b $ b

在以下情况下,首选C样式的字符串文字:



  • 指针样式的语义想要(或至少不是问题)



  • 该值仅传递给期望 const char的函数* 还是 std :: string 临时对象会无论如何构造等式,如果有可能重用相同的 std :: string 实例(例如当通过 const -reference传递给函数时)-再次很少需要护理。



  • (另一种罕见且令人讨厌的hack),您正在以某种方式利用编译器的字符串池行为,例如如果它保证对于任何给定的翻译单元,则 const char * 字符串文字只会(但当然总是)如果文本不同而有所不同



    • 您真的不能从 std :: string .data() / .c_str(),因为相同的地址可能与不同的文本(以及不同的 std :: string 实例),并且在不同地址的 std :: string 缓冲区可能包含相同的文本



  • std :: string 离开范围并被销毁后,使指针保持有效将使您受益。例如,给定 enum My_Enum {零,一个}; - const char * str(My_Enum e){返回e ==零? 0: 1;} 是安全的,但 const char * str(My_Enum e){return e == 0? 0 s.c_str(): 1 s.c_str();} 不是, std :: string str(My_Enum e){return e == Z ero? 0: 1; } 总是使用动态分配时会过早悲观(无SSO,或较长的文本))



  • you 利用相邻C字符串文字的编译时级联(例如 abc xyz 成为一个连续的 const char [] 文字 abcxyz )-这在宏替换内部特别有用



  • 您的内存受到限制,并且/或者不想在动态内存分配期间冒异常或崩溃的风险




讨论


[basic.string.literals] 21.7列表:


字符串运算符 s(const char * str,size_t len);


返回值: string {str,len}


基本上,使用的称为函数会按值返回 std :: string 的函数-至关重要的是,您可以绑定 const 引用或右值引用,


用于调用 void foo(std :: string arg); arg 确实会被 移动 构造。


此外,如果foo重载以接受右值引用怎么办?在那种情况下,我认为调用foo( bar s)是有意义的,但是我可能是错的。


没关系您选择了很多。维护方面的明智选择-如果 foo(const std :: string&)曾经更改为 foo(const char *),仅 foo( xyz); 调用将无缝地继续工作,但是可能有非常几个模糊的合理原因(因此C代码可以调用它吗?-但是仍然为不继续为现有的客户端代码提供 foo(const std :: string&)重载而有点生气;所以可以在C中实现-也许;删除对< string> 标头的依赖性?-与现代计算资源无关)。


< blockquote>

std :: cout<< Hello World! << std :: endl; //选项1


std :: cout<< < Hello World!<< std :: endl; //选项2


前者将调用 operator <(<(std :: ostream& ;, const char *)直接访问常量字符串文字数据,唯一的缺点是流式传输可能必须扫描终止的NUL。 选项2会匹配 const -引用重载并暗示构造一个临时文件,尽管编译器可能能够对其进行优化,因此它们不会不必要地经常这样做,甚至无法有效地创建编译时使用字符串对象(这可能仅适用于足够短的字符串以使用对象内短字符串优化(SSO)方法)。如果他们还没有进行这样的优化,那么这样做的潜在收益和压力/愿望可能会增加。


So C++ 14 introduced a number of user-defined literals to use, one of which is the "s" literal suffix, for creating std::string objects. According to the documentation, its behavior is exactly the same as constructing an std::string object, like so:

auto str = "Hello World!"s; // RHS is equivalent to: std::string{ "Hello World!" }

Of course constructing an unnamed std::string object could be done prior to C++ 14, but because the C++ 14 way is so much simpler, I think way more people will actually consider constructing std::string objects on the spot than before, that's why I thought it makes sense to ask this.

So my question is simple: In what cases it's a good (or bad) idea construct an unnamed std::string object, instead of simply using a C-style string literal?


Example 1:

Consider the following:

void foo(std::string arg);

foo("bar");  // option 1
foo("bar"s); // option 2

If I'm correct, the first method will call the appropriate constructor overload of std::string to create an object inside foo's scope, and the second method will construct an unnamed string object first, and then move-construct foo's argument from that. Although I'm sure that compilers are very good at optimizing stuff like this, but still, the second version seems like it involves an extra move, as opposed to the first alternative (not like a move is expensive of course). But again, after compiling this with a reasonable compiler, the end results are most likely to be highly optimized, and free of redundand moves/copies anyway.

Also, what if foo is overloaded to accept rvalue references? In that case, I think it would make sense to call foo("bar"s), but I could be wrong.


Example 2:

Consider the following:

std::cout << "Hello World!" << std::endl;  // option 1
std::cout << "Hello World!"s << std::endl; // option 2

In this case, the std::string object is probably passed to cout's operator via rvalue reference, and the first option passes a pointer probably, so both are very cheap operations, but the second one has the extra cost of constructing an object first. It's probably a safer way to go though (?).


In all cases of course, constructing an std::string object could result in a heap allocation, which could throw, so exception safety should be taken into consideration as well. This is more of an issue in the second example though, as in the first example, an std::string object will be constructed in both cases anyway. In practice, getting an exception from constructing a string object is very unlikely, but still could be a valid argument in certain cases.

If you can think of more examples to consider, please include them in your answer. I'm interested in a general advice regarding the usage of unnamed std::string objects, not just these two particular cases. I only included these to point out some of my thoughts regarding this topic.

Also, if I got something wrong, feel free to correct me as I'm not by any means a C++ expert. The behaviors I described are only my guesses on how things work, and I didn't base them on actual research or experimenting really.

解决方案

In what cases it's a good (or bad) idea construct an unnamed std::string object, instead of simply using a C-style string literal?

A std::string- literal is a good idea when you specifically want a variable of type std::string, whether for

  • modifying the value later (auto s = "123"s; s += '\n';)

  • the richer, intuitive and less error-prone interface (value semantics, iterators, find, size etc)

    • value semantics means ==, < copying etc. work on the values, unlike the pointer/by-reference semantics after C-string literals decay to const char*s
  • calling some_templated_function("123"s) would concisely ensure a <std::string> instantiation, with the argument being able to be handled using value semantics internally

    • if you know other code's instantiating the template for std::string anyway, and it's of significant complexity relative to your resource constraints, you might want to pass a std::string too to avoid unnecessarily instantiation for const char* too, but it's rare to need to care
  • values containing embedded NULs

A C-style string literal might be preferred where:

  • pointer-style semantics are wanted (or at least not a problem)

  • the value's only going to be passed to functions expecting const char* anyway, or std::string temporaries will get constructed anyway and you don't care that you're giving your compiler optimiser one extra hurdle to leap to achieve compile or load time construction if there's potential to reuse the same std::string instance (e.g. when passing to functions by const-reference) - again it's rare to need to care.

  • (another rare and nasty hack) you're somehow leveraging your compiler's string pooling behaviour, e.g. if it guarantees that for any given translation unit the const char* to string literals will only (but of course always) differ if the text differs

    • you can't really get the same from std::string .data()/.c_str(), as the same address may be associated with different text (and different std::string instances) during the program execution, and std::string buffers at distinct addresses may contain the same text
  • you benefit from having the pointer remain valid after a std::string would leave scope and be destroyed (e.g. given enum My_Enum { Zero, One }; - const char* str(My_Enum e) { return e == Zero ? "0" : "1"; } is safe, but const char* str(My_Enum e) { return e == Zero ? "0"s.c_str() : "1"s.c_str(); } isn't and std::string str(My_Enum e) { return e == Zero ? "0"s : "1"s; } smacks of premature pessimism in always using dynamic allocation (sans SSO, or for longer text))

  • you're leveraging compile-time concatenation of adjacent C-string literals (e.g. "abc" "xyz" becomes one contiguous const char[] literal "abcxyz") - this is particularly useful inside macro substitutions

  • you're memory constrained and/or don't want to risk an exception or crash during dynamic memory allocation

Discussion

[basic.string.literals] 21.7 lists:

string operator "" s(const char* str, size_t len);

Returns: string{str,len}

Basically, using ""s is calling a function that returns a std::string by value - crucially, you can bind a const reference, or rvalue reference, but not an lvalue reference.

When used to call void foo(std::string arg);, arg will be indeed be move constructed.

Also, what if foo is overloaded to accept rvalue references? In that case, I think it would make sense to call foo("bar"s), but I could be wrong.

Doesn't matter much which you choose. Maintenance wise - if foo(const std::string&) is ever changed to foo(const char*), only foo("xyz"); invocations will seamlessly continue working, but there are very few vaguely plausible reasons it might be (so C code could call it too? - but still it'd be a bit mad not to continue to provide a foo(const std::string&) overload for existing client code; so it could be implemented in C? - perhaps; removing dependency on the <string> header? - irrelevant with modern computing resources).

std::cout << "Hello World!" << std::endl; // option 1

std::cout << "Hello World!"s << std::endl; // option 2

The former will call operator<<(std::ostream&, const char*), directly accessing the constant string literal data, with the only disadvantage being that the streaming may have to scan for the terminating NUL. "option 2" would match a const-reference overload and implies construction of a temporary, though compilers might be able to optimise it so they're not doing that unnecessarily often, or even effectively create the string object at compile time (which might only be practical for strings short enough to use an in-object Short String Optimisation (SSO) approach). If they're not doing such optimisations already, the potential benefit and hence pressure/desire to do so is likely to increase.

这篇关于使用C样式字符串文字与构造未命名std :: string对象的默认建议?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆