是否允许std :: string的end +1迭代器? [英] Are end+1 iterators for std::string allowed?

查看:191
本文介绍了是否允许std :: string的end +1迭代器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于 std :: string ?创建一个迭代器 end(str)+1 br>
如果不是,为什么

不是?



这个问题限于C ++ 11和以后,因为在pre-C ++ 11数据已经存储在任何罕见的POC玩具实现中的连续块,数据没有以这种方式存储。 >
我认为这可能会有所有的区别。



std :: string 和我推测的任何其他标准容器是它总是包含一个元素比它的 size ,零终止符,以满足 .c_str()


21.4.7.1 basic_string存取器 [string.accessors] >



  const charT * c_str()const noexcept; 
const charT * data()const noexcept;

1返回:指针 p 中的每个 i p + i ==& operator [ 0,size()]

2复杂性:恒定时间。

3要求:程序不得更改存储在字符中的任何值数组。


仍然,即使它应该 imho保证所述表达式是有效的,为了一致性和互操作性如果没有别的,我发现只有一个段落对此怀疑:


21.4.1 basic_string一般要求 [string.require]



4 basic_string 对象中的类似char的对象连续。也就是说,对于任何 basic_string 对象 s ,身份& *(s.begin ()+ n)==& * s.begin()+ n 将适用于 n 的所有值, c> 0 <= n <


(所有引用都来自C ++ 14最终草案(n3936) 。)



相关:法律覆盖std :: string的空终止符?

解决方案

TL; DR: code> s.end()+ 1 是未定义的行为。






code> std :: string 是一个奇怪的野兽,主要是因为历史原因:


  1. 提供C兼容性,其中已知额外的 \0 字符超过由 strlen 报告的长度。

  2. 它是用基于索引的界面设计的。

  3. 后来,在标准库中合并STL代码的其余部分,添加了基于迭代器的界面。

这会导致 std :: string ,在C ++ 03中,将 103个成员函数编号,从那时起,

因此,应该预期不同方法之间的差异。






已经在基于索引的界面中出现差异:


§21.4.5[string.access] / strong>



const_reference运算符[](size_type pos)const;

reference operator [](size_type pos);



1 / / em> pos< = size()



const_reference at(size_type pos) const;
引用(size_type pos);



5 / 投掷 out_of_range if pos& >


是的,你读这个权限, s [s.size()] 返回对NUL字符的引用, s.at(s.size())抛出 out_of_range 异常。如果任何人告诉你用替换的所有使用 operator [] ,因为它们更安全, $ c> string trap ...






那么,迭代器呢? >


§21.4.3[string.iterators]



iterator end()noexcept;

const_iterator end()const noexcept;

const_iterator cend()const noexcept;



2 /


返回:是一个迭代器,它是过去的结束



所以我们要参考其他段落。指针由


§21.4[basic.string]



3 / basic_string 支持的迭代器是随机访问迭代器(24.2.7)。


,而§17.6[要求] 似乎没有任何相关内容。因此,字符串迭代器只是普通的迭代器(你可能知道这是怎么回事...但是因为我们来到这里,让我们一路走来)。



引导我们:


24.2.1 [iterator.requirements.general]



5 / 正如指向数组的常规指针一样,保证有一个指针值指向数组的最后一个元素,所以对于任何迭代器类型,都有一个迭代器值指向相应序列的最后一个元素。这些值称为 past-the-end 值。定义表达式 * i 的迭代器 i 的值称为dereferenceable。库从不假设过去的结束值是可解引用的。 [...]


所以, * s.end()


24.2.3 [input.iterators] b $ b

2 / 表107 - 输入迭代器要求(除Iterator外)


++ r r ++ 列出前置条件 r be dereferencable。



Forward迭代器,双向迭代器和Random迭代器都不提升这个限制(并且都表明它们继承了它们的前身的限制)。



此外,为了完整性,在 24.2.7 [random.access.iterators] 中表111-随机访问迭代器要求到双向迭代器)列出了以下操作语义:




  • r + = n 等于[inc | dec] rememting r n

  • a + n n + a 等效于复制 a ,然后将 + = n 应用到副本



- = n - n



strong>因此 s.end()+ 1 是未定义的行为。


Is it valid to create an iterator to end(str)+1 for std::string?
And if it isn't, why isn't it?

This question is restricted to C++11 and later, because while pre-C++11 the data was already stored in a continuous block in any but rare POC toy-implementations, the data didn't have to be stored that way.
And I think that might make all the difference.

The significant difference between std::string and any other standard container I speculate on is that it always contains one element more than its size, the zero-terminator, to fulfill the requirements of .c_str().

21.4.7.1 basic_string accessors [string.accessors]

const charT* c_str() const noexcept;
const charT* data() const noexcept;

1 Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].
2 Complexity: Constant time.
3 Requires: The program shall not alter any of the values stored in the character array.

Still, even though it should imho guarantee that said expression is valid, for consistency and interoperability with zero-terminated strings if nothing else, the only paragraph I found casts doubt on that:

21.4.1 basic_string general requirements [string.require]

4 The char-like objects in a basic_string object shall be stored contiguously. That is, for any basic_string object s, the identity &*(s.begin() + n) == &*s.begin() + n shall hold for all values of n such that 0 <= n < s.size().

(All quotes are from C++14 final draft (n3936).)

Related: Legal to overwrite std::string's null terminator?

解决方案

TL;DR: s.end() + 1 is undefined behavior.


std::string is a strange beast, mainly for historical reasons:

  1. It attempts to bring C compatibility, where it is known that an additional \0 character exists beyond the length reported by strlen.
  2. It was designed with an index-based interface.
  3. As an after thought, when merged in the Standard library with the rest of the STL code, an iterator-based interface was added.

This led std::string, in C++03, to number 103 member functions, and since then a few were added.

Therefore, discrepancies between the different methods should be expected.


Already in the index-based interface discrepancies appear:

§21.4.5 [string.access]

const_reference operator[](size_type pos) const;
reference operator[](size_type pos);

1/ Requires: pos <= size()

const_reference at(size_type pos) const; reference at(size_type pos);

5/ Throws: out_of_range if pos >= size()

Yes, you read this right, s[s.size()] returns a reference to a NUL character while s.at(s.size()) throws an out_of_range exception. If anyone tells you to replace all uses of operator[] by at because they are safer, beware the string trap...


So, what about iterators?

§21.4.3 [string.iterators]

iterator end() noexcept;
const_iterator end() const noexcept;
const_iterator cend() const noexcept;

2/ Returns: An iterator which is the past-the-end value.

Wonderfully bland.

So we have to refer to other paragraphs. A pointer is offered by

§21.4 [basic.string]

3/ The iterators supported by basic_string are random access iterators (24.2.7).

while §17.6 [requirements] seems devoid of anything related. Thus, strings iterators are just plain old iterators (you can probably sense where this is going... but since we came this far let's go all the way).

This leads us to:

24.2.1 [iterator.requirements.general]

5/ Just as a regular pointer to an array guarantees that there is a pointer value pointing past the last element of the array, so for any iterator type there is an iterator value that points past the last element of a corresponding sequence. These values are called past-the-end values. Values of an iterator i for which the expression *i is defined are called dereferenceable. The library never assumes that past-the-end values are dereferenceable. [...]

So, *s.end() is ill-formed.

24.2.3 [input.iterators]

2/ Table 107 -- Input iterator requirements (in addition to Iterator)

List for pre-condition to ++r and r++ that r be dereferencable.

Neither the Forward iterators, Bidirectional iterators nor Random iterator lift this restriction (and all indicate they inherit the restrictions of their predecessor).

Also, for completeness, in 24.2.7 [random.access.iterators], Table 111 -- Random access iterator requirements (in addition to bidirectional iterator) lists the following operational semantics:

  • r += n is equivalent to [inc|dec]rememting r n times
  • a + n and n + a are equivalent to copying a and then applying += n to the copy

and similarly for -= n and - n.

Thus s.end() + 1 is undefined behavior.

这篇关于是否允许std :: string的end +1迭代器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆