跨子对象边界的指针算法 [英] Pointer arithmetic across subobject boundaries

查看:96
本文介绍了跨子对象边界的指针算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下代码(跨子对象边界执行指针算术)是否针对其编译的类型T(在C ++ 11中为 可以是其他保证的情况如果将标准中的标准"组合在一起,则在这种情况下无论如何都可能需要定义,以便保持逻辑上的自洽. (我不敢打赌,但是至少可以想象得到.)

编辑:@MatthieuM提出了这样的反对意见,即此类不是标准布局,因此,即使两个子类都不能在基础子对象与派生对象的第一个成员之间包含任何填充,与alignof(T)对齐.我不确定这是否正确,但这提出了以下变体问题:

  • 如果继承被删除,是否可以保证工作正常?

  • 即使不是&d.end - &d.begin == sizeof(float) * 10,也可以保证&d.end - &d.begin >= sizeof(float) * 10吗?

最后编辑 @ArneMertz要求非常仔细地阅读 N3242/expr.add (是的,我知道我正在阅读草稿,但已经足够接近了) ,但是该标准是否真的暗示着以下对象具有未定义的行为,那么如果交换线被删除了? (与上述类定义相同)

int main()
{
    Derived<float, 10> d;
    bool aligned;
    float * p = &d.initial[0], * q = &d.rest[0];

    ++p;
    if((aligned = (p == q)))
    {
        std::swap(p, q); // does it matter if this line is removed?
        *++p = 1.0;
    }

    assert(!aligned || d.rest[1] == 1.0);

    return 0;
}

此外,如果==不够强大,那么如果我们利用std::less在指针上形成总阶的事实,并将上述条件更改为:

    if((aligned = (!std::less<float *>()(p, q) && !std::less<float *>()(q, p))))

根据严格的标准阅读,代码是否假定两个相等的指针指向同一数组对象确实被破坏了?

编辑抱歉,只想再添加一个示例,即可消除标准布局问题:

#include <cassert>
#include <cstddef>
#include <utility>
#include <functional>

// standard layout
struct Base
{
    float initial[1];
    float rest[9];
};

int main()
{
    Base b;
    bool aligned;
    float * p = &b.initial[0], * q = &b.rest[0];

    ++p;
    if((aligned = (p == q)))
    {
        std::swap(p, q); // does it matter if this line is removed?
        *++p = 1.0;
        q = &b.rest[1];
        // std::swap(p, q); // does it matter if this line is added?
        p -= 2; // is this UB?
    }
    assert(!aligned || b.rest[1] == 1.0);
    assert(p == &b.initial[0]);

    return 0;
}

解决方案

更新:该答案最初缺少一些信息,因此得出错误的结论.

在您的示例中,initialrest是明显不同的(数组)对象,因此将指向initial(或其元素)的指针与指向rest(或其元素)的指针进行比较

  • UB,如果使用指针的区别. (§5.7,6)
  • 未指定,如果您使用关系运算符(第5.9.2节)
  • ==定义的很好(因此,第二个片段很好,请参见下文)

第一个代码段:

对于您提供的报价(§5.7,6),第一个代码段中的差异是未定义的行为:

除非两个指针都指向同一数组对象的元素,否则 在数组对象的最后一个元素之后,行为是不确定的.

要阐明第一个示例代码的UB部分:

//first example
int main()
{
    Derived<float, 10> d;
    assert(&d.rest[9] - &d.initial == 10);            //!!! UB !!!
    assert(&d.end - &d.begin == sizeof(float) * 10);  //!!! UB !!! (*)
    return 0;
}

标记为(*)的行很有趣:d.begind.end不是同一数组的元素,因此该操作导致UB.尽管您可能会reinterpret_cast<char*>(&d)并在结果数组中拥有它们的两个地址,但是这是事实.但是由于该数组是d all 的表示形式,因此不应将其视为对d parts 的访问.因此,尽管该操作可能会奏效,并且可以在任何人梦dream以求的实施方案上产生预期的结果,但根据定义,它仍然是UB.

第二个片段:

这实际上是定义明确的行为,但是实现定义的结果:

int main()
{
    Derived<float, 10> d;
    assert(&d.rest[9] - &d.rest[0] == 9);
    assert(&d.rest[0] == &d.initial[1]);         //(!)
    assert(&d.initial[1] - &d.initial[0] == 1);
    return 0;
}

标有(!)的行不是 ub,但是其结果是定义的实现,因为填充,对齐方式和提到的工具可能会起作用. 但是如果该断言将成立,则您可以使用两个对象部分,例如一个数组.

您会知道rest[0]将紧随initial[0]放在内存中. 一见钟情,您不能轻易使用等式:

  • initial[1]将指向initial的最后一位,将其引用为UB.
  • rest[-1]显然超出范围.

但是输入§3.9.2,3:

如果类型为T的对象位于地址A上,则为类型为 cv T*的指针,其值为 据说地址A指向该对象,而不管如何获取该值. [注意:例如, 超出数组末尾(5.7)的地址将被视为指向该对象的不相关对象 数组的元素类型可能位于该地址.

因此,假设&initial[1] == &rest[0],它将是二进制的,就好像只有一个数组一样,所有都可以.

您可以遍历两个数组,因为您可以在边界处应用一些指针上下文切换".因此,最后一个代码段:不需要swap

但是,有一些警告:rest[-1]是UB,initial[2]也是如此,因为§5.7,5:

如果指针操作数和结果都指向同一数组对象的元素,或者指向过去 数组对象的最后一个元素,求值不应产生溢出; 否则,该行为是 未定义.

(重点是我的).那么,这两个如何融合在一起?

  • 良好路径":&initial[1]可以,并且由于&initial[1] == &rest[0],您可以使用该地址并继续增加指针以访问rest的其他元素,这是因为§3.9.2,3
  • 错误路径":initial[2]*(initial + 2),但是由于第5.7.5节,initial +2已经是UB,您在这里永远不会使用第3.9.2,3节.

一起:您必须在边界处停留,稍作休息以检查地址是否相等,然后您可以继续前进.

Does the following code (which performs pointer arithmetic across subobject boundaries) have well-defined behavior for types T for which it compiles (which, in C++11, does not not necessarily have to be POD) or any subset thereof?

#include <cassert>
#include <cstddef>

template<typename T>
struct Base
{
    // ensure alignment
    union
    {
        T initial;
        char begin;
    };
};

template<typename T, size_t N>
struct Derived : public Base<T>
{
    T rest[N - 1];
    char end;
};

int main()
{
    Derived<float, 10> d;
    assert(&d.rest[9] - &d.initial == 10);
    assert(&d.end - &d.begin == sizeof(float) * 10);
    return 0;
}

LLVM uses a variation of the above technique in the implementation of an internal vector type which is optimized to initially use the stack for small arrays but switches to a heap-allocated buffer once over initial capacity. (The reason for doing it this way is not clear from this example but is apparently to reduce template code bloat; this is clearer if you look through the code.)

NOTE: Before anyone complains, this is not exactly what they are doing and it might be that their approach is more standards-compliant than what I have given here, but I wanted to ask about the general case.

Obviously, it works in practice, but I'm curious if anything in the standard guarantees for that to be the case. I'm inclined to say no, given N3242/expr.add:

When two pointers to elements of the same array object are subtracted, the result is the difference of the subscripts of the two array elements...Moreover, if the expression P points either to an element of an array object or one past the last element of an array object, and the expression Q points to the last element of the same array object, the expression ((Q)+1)-(P) has the same value as ((Q)-(P))+1 and as -((P)-((Q)+1)), and has the value zero if the expression P points one past the last element of the array object, even though the expression (Q)+1 does not point to an element of the array object. ...Unless both pointers point to elements of the same array object, or one past the last element of the array object, the behavior is undefined.

But theoretically, the middle part of the above quote, combined with class layout and alignment guarantees, might allow the following (minor) adjustment to be valid:

#include <cassert>
#include <cstddef>

template<typename T>
struct Base
{
    T initial[1];
};

template<typename T, size_t N>
struct Derived : public Base<T>
{
    T rest[N - 1];
};

int main()
{
    Derived<float, 10> d;
    assert(&d.rest[9] - &d.rest[0] == 9);
    assert(&d.rest[0] == &d.initial[1]);
    assert(&d.rest[0] - &d.initial[0] == 1);
    return 0;
}

which combined with various other provisions concerning union layout, convertibility to and from char *, etc., might arguably make the original code valid as well. (The main problem is the lack of transitivity in the definition of pointer arithmetic given above.)

Anyone know for sure? N3242/expr.add seems to make clear that pointers must belong to the same "array object" for it to be defined, but it could hypothetically be the case that other guarantees in the standard, when combined together, might require a definition anyway in this case in order to remain logically self-consistent. (I'm not betting on it, but I would it's at least conceivable.)

EDIT: @MatthieuM raises the objection that this class is not standard-layout and therefore might not be guaranteed to contain no padding between the base subobject and the first member of the derived, even if both are aligned to alignof(T). I'm not sure how true that is, but that opens up the following variant questions:

  • Would this be guaranteed to work if the inheritance were removed?

  • Would &d.end - &d.begin >= sizeof(float) * 10 be guaranteed even if &d.end - &d.begin == sizeof(float) * 10 were not?

LAST EDIT @ArneMertz argues for a very close reading of N3242/expr.add (yes, I know I'm reading a draft, but it's close enough), but does the standard really imply that the following has undefined behavior then if the swap line is removed? (same class definitions as above)

int main()
{
    Derived<float, 10> d;
    bool aligned;
    float * p = &d.initial[0], * q = &d.rest[0];

    ++p;
    if((aligned = (p == q)))
    {
        std::swap(p, q); // does it matter if this line is removed?
        *++p = 1.0;
    }

    assert(!aligned || d.rest[1] == 1.0);

    return 0;
}

Also, if == is not strong enough, what if we take advantage of the fact that std::less forms a total order over pointers, and change the conditional above to:

    if((aligned = (!std::less<float *>()(p, q) && !std::less<float *>()(q, p))))

Is code that assumes that two equal pointers point to the same array object really broken according to a strict reading of the standard?

EDIT Sorry, just want to add one more example, to eliminate the standard layout issue:

#include <cassert>
#include <cstddef>
#include <utility>
#include <functional>

// standard layout
struct Base
{
    float initial[1];
    float rest[9];
};

int main()
{
    Base b;
    bool aligned;
    float * p = &b.initial[0], * q = &b.rest[0];

    ++p;
    if((aligned = (p == q)))
    {
        std::swap(p, q); // does it matter if this line is removed?
        *++p = 1.0;
        q = &b.rest[1];
        // std::swap(p, q); // does it matter if this line is added?
        p -= 2; // is this UB?
    }
    assert(!aligned || b.rest[1] == 1.0);
    assert(p == &b.initial[0]);

    return 0;
}

解决方案

Updated: This answer at first missed some information and thus lead to wrong conclusions.

In your examples, initial and rest are clearly distinct (array) objects, so comparing pointers to initial (or its elements) with pointers to rest (or its elements) is

  • UB, if you use the difference of the pointers. (§5.7,6)
  • unspecified, if you use relational operators (§5.9,2)
  • well defined for == (So the second snipped is good, see below)

First snippet:

Building the difference in the first snippet is undefined behavior, for the quote you provided (§5.7,6):

Unless both pointers point to elements of the same array object, or one past the last element of the array object, the behavior is undefined.

To clarify the UB parts of the first example code:

//first example
int main()
{
    Derived<float, 10> d;
    assert(&d.rest[9] - &d.initial == 10);            //!!! UB !!!
    assert(&d.end - &d.begin == sizeof(float) * 10);  //!!! UB !!! (*)
    return 0;
}

The line marked with (*) is interesting: d.begin and d.end are not elements of the same array and therefore the operation results in UB. This is despite the fact you may reinterpret_cast<char*>(&d) and have both their addresses in the resulting array. But since that array is a representation of all of d, it's not to be seen as an access to parts of d. So while that operation probably will just work and give the expected result on any implementation one can dream of, it still is UB - as a matter of definition.

Second snippet:

This is actually well defined behavior, but implementation defined result:

int main()
{
    Derived<float, 10> d;
    assert(&d.rest[9] - &d.rest[0] == 9);
    assert(&d.rest[0] == &d.initial[1]);         //(!)
    assert(&d.initial[1] - &d.initial[0] == 1);
    return 0;
}

The line marked with (!) is not ub, but its result is implementation defined, since padding, alignment and the mentioned instumentation might play a role. But if that assertion would hold, you could use the two object parts like one array.

You would know that rest[0] would lay immediately after initial[0] in memory. At first sight, you could not easily use the equality:

  • initial[1] would point one-past-the-end of initial, dereferencing it is UB.
  • rest[-1] is clearly out of bounds.

But enters §3.9.2,3:

If an object of type T is located at an address A, a pointer of type cv T* whose value is the address A is said to point to that object, regardless of how the value was obtained. [ Note: For instance, the address one past the end of an array (5.7) would be considered to point to an unrelated object of the array’s element type that might be located at that address.

So provided that &initial[1] == &rest[0], it will be binary the same as if there was only one array, and all will be ok.

You could iterate over both arrays, since you could apply some "pointer context switch" at the boundaries. So to your last snippet: the swap is not needed!

However, there are some caveats: rest[-1] is UB, and so would be initial[2], because of §5.7,5:

If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

(emphasis mine). So how do these two fit together?

  • "Good path": &initial[1] is ok, and since &initial[1] == &rest[0] you can take that address and go on to increment the pointer to access the other elements of rest, because of §3.9.2,3
  • "Bad path": initial[2] is *(initial + 2), but since §5.7,5, initial +2 is already UB and you never get to use §3.9.2,3 here.

Together: you have to stop by at the boundary, take a short break to check that the addresses are equal and then you can move on.

这篇关于跨子对象边界的指针算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆