在C ++ 11中,将引用/指针返回到std :: string中某个位置的最有效方式是什么? [英] In C++11 what is the most performant way to return a reference/pointer to a position in a std::string?
问题描述
我正在构建一个文本解析器,该文本解析器使用 std :: string
作为字符串的核心存储。
我知道这不是最佳选择,并且编译器内部的解析器为此使用了优化的方法。在我的项目中,我不介意失去一些性能以换取更多的清晰度和更轻松的维护。
开始时,我将大量文本读入内存,然后对每个文本进行扫描字符来构建一组有序的令牌,它是一个简单的词法分析器。目前,我正在使用 std :: string
表示令牌的文本,但是我想通过在原始文本中使用引用/指针来对此进行一些改进。 / p>
根据我所读的内容,返回并保留给迭代器是一种不好的做法,而引用 std也是一种不好的做法:字符串
内部缓冲区。
关于如何以干净方式完成此操作的任何建议?
有人建议在即将到来的标准中向C ++添加 string_view
。
A string_view
是一个不带字符的可重复范围,它具有字符类所需的许多实用程序和属性,但不能插入/删除字符(并且编辑字符通常在
我建议尝试这种方法-编写自己的(在自己的实用程序名称空间中)。 (无论如何,您都应该拥有自己的实用程序名称空间,以用于可重复使用的代码段。)
核心数据是一对 char *
pr std :: string :: iterator
s(或 const
版本)。如果用户需要一个终止为null的缓冲区,则 to_string
方法将分配一个缓冲区。我将从非可变( const
)字符数据开始。不要忘记开始
和 end
:这使您的视图可以通过 for(:)
循环。
此设计存在以下危险:原始的 std :: string
必须
如果出于安全考虑愿意放弃一些性能,请让该视图拥有一个 std: :shared_ptr< const std :: string>
,它可以将 std :: string
移入其中,并作为第一步将整个缓冲区移入它,然后开始砍/解析它。 (子视图为指向相同数据的新共享指针)。然后,您的视图类更像是具有共享存储空间的不可更改字符串。
shared_ptr< const>
版本具有安全性,视图的生存期更长(不再存在生存期依赖性),您可以轻松地将 const
substring类型方法转发给 std :: string
,因此您可以编写更少的代码。
缺点包括可能与传入标准one 1不兼容和较低的性能,因为您拖动了 shared_ptr
。
我怀疑视图和范围将在现代C ++中,随着该语言的即将出现和最近的改进,它变得越来越重要。
boost :: string_ref
显然是对C ++ 1y标准。
1 尽管在模板元编程中添加功能有多么简单,但对视图类型使用资源所有者模板参数可能是一个不错的设计决策。然后,您可以拥有和拥有 string_view
s,它们具有相同的语义...
I'm building a text parser that uses std::string
as the core storage for strings.
I know this is not optimal and that parsers inside compilers use optimzed approaches for this. In my project I don't mind losing some performance in exchange for more clarity and easier maintenance.
At the beginning I read a huge text into memory and then I scan each character to build a ordered set of tokens, its a simple lexer. Currently I'm using std::string
to represent the text of a token but I would like to improve this a bit by using a reference/pointer into the original text.
From what I have read it is a bad practice to return and hold to iterators and it is also a bad practice to refer to the std::string
internal buffer.
Any suggestions on how to accomplish this in a "clean" way?
There are proposals to add string_view
to C++ in an upcoming standard.
A string_view
is a non-owning iterable range over characters with many of the utilities and properties you'd expect of a string class, except you cannot insert/delete characters (and editing characters is often blocked in some subtypes).
I would advise trying that approach -- write your own (in your own utility namespace). (You should have your own utility namespace for reusable code snippets anyhow).
The core data is a pair of char*
pr std::string::iterator
s (or const
versions). If the user needs a null terminated buffer, a to_string
method allocates one. I would start with non-mutable (const
) character data. Do not forget begin
and end
: that makes your view iterable with for(:)
loops.
This design has the danger that the original std::string
has to persist long enough to outlast all of the views.
If you are willing to give up some performance for safety, have the view own a std::shared_ptr<const std::string>
that it can move a std::string
into, and as a first step move the entire buffer into it, and then start chopping/parsing it down. (child views make a new shared pointer to same data). Then your view class is more like a non-mutable string with shared storage.
The upsides to the shared_ptr<const>
version include safety, longer lifetime of the views (there is no more lifetime dependency), and you can easily forward your const
"substring" type methods to the std::string
so you can write less code.
Downsides include possible incompatibility with incoming standard one1, and lower performance because you are dragging a shared_ptr
around.
I suspect views and ranges are going to be increasingly important in modern C++ with the upcoming and recent improvements to the language.
boost::string_ref
is apparently an implementation of a proposal to the C++1y standard.
1 however, given how simple it is to add capabilities in template metaprogramming, having a "resource owner" template argument to a view type might be a good design decision. Then you can have owning and non-owning string_view
s with otherwise identical semantics...
这篇关于在C ++ 11中,将引用/指针返回到std :: string中某个位置的最有效方式是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!