C ++ gcc扩展用于基于非零的数组指针分配? [英] C++ gcc extension for non-zero-based array pointer allocation?

查看:69
本文介绍了C ++ gcc扩展用于基于非零的数组指针分配?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找gcc支持的C ++语言扩展,以启用基于非零的数组指针的分配。理想情况下,我可以简单地写:

I am looking for a gcc-supported C++ language extension to enable the allocation of non-zero-based array pointers. Ideally I could simply write:

#include<iostream>  
using namespace std;

// Allocate elements array[lo..hi-1], and return the new array.
template<typename Elem>
Elem* Create_Array(int lo, int hi)
{
  return new Elem[hi-lo] - lo;
  // FIXME what about [expr.add]/4.
  // How do we create a pointer outside the array bounds?
}

// Deallocate an array previously allocated via Create_Array.
template<typename Elem>
void Destroy_Array(Elem* array, int lo, int hi)
{
  delete[](array + lo);
}


int main() 
{  
  const int LO = 1000000000;
  const int HI = LO + 10;
  int* array = Create_Array<int>(LO, HI);
  for (int i=LO; i<HI; i++)
    array[i] = i;
  for (int i=LO; i<HI; i++)
    cout << array[i] << "\n";
  Destroy_Array(array, LO, HI);
} 

上面的代码似乎可行,但不是C ++标准定义的。具体来说,问题是 [expr.add] / 4

The above code seems to work, but is not defined by the C++ standard. Specifically, the issue is [expr.add]/4:


当将具有整数类型的表达式添加到指针或从指针中减去
时,结果的类型为指针操作数。如果
表达式P指向具有n个
元素的数组对象x的元素x [i],则表达式P + J和J + P(其中J的值为j)
点如果0≤i + j≤
n,则向(可能是假设的)元素x [i + j]; 否则,行为是不确定的。同样,如果0≤i-j
≤n,则表达式P-
J指向(可能是假设的)元素x [i-j];

When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the expression P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i + j] if 0 ≤ i + j ≤ n; otherwise, the behavior is undefined. Likewise, the expression P - J points to the (possibly-hypothetical) element x[i − j] if 0 ≤ i − j ≤ n; otherwise, the behavior is undefined.

换句话说,上面代码中标记为FIXME的行的行为是不确定的,因为它计算一个从0开始的数组 x 范围 x [0..n] 范围内的指针。

In other words, behavior is undefined for the line marked FIXME in the code above, because it calculates a pointer that is outside the range x[0..n] for the 0-based array x.

gcc -std = ... 选项c>告诉它允许直接计算基于非零的数组指针?

Is there some --std=... option to gcc to tell it to allow non-zero-based array pointers to be directly calculated?

如果没有,是否存在一种合理的可移植方式来模拟返回新的Type [hi-lo]-lo; 语句,也许通过转换为 long 并返回? (但是那样我会担心引入更多的错误)

If not, is there a reasonably portable way to emulate the return new Type[hi-lo] - lo; statement, perhaps by casting to long and back? (but then I would worry about introducing more bugs)

此外,可以通过这样的方式来完成此操作,该方式只需要一个寄存器即可跟踪每个数组,如代码以上?例如,如果我有 array1 [i],array2 [i],array3 [i] ,则只需要3个寄存器用于数组指针 array1, array2,array3 以及一个用于 i 的寄存器? (类似地,如果冷获取数组引用,我们应该能够直接获取不基于零的指针,而无需进行计算仅在寄存器中建立引用)

Furthermore, can this be done in a way that requires only 1 register to keep track of each array, like the code above? For example if I have array1[i], array2[i], array3[i] this requires only the 3 registers for the array pointers array1, array2, array3, plus one register for i? (similarly, if cold-fetching the array references, we should be able to just fetch the non-zero-based pointer directly, without doing calculations merely to establish the reference in registers)

推荐答案

假定您在Linux x86-64上使用gcc,它支持 intptr_t uintptr_t 类型可以容纳任何指针值(有效或无效),并且还支持整数运算。 uintptr_t 在此应用程序中更适合,因为它支持 mod 2 ^ 64语义,而 intptr_t 有UB情况。

Assuming you're using gcc on linux x86-64, it supports the intptr_t and uintptr_t types which can hold any pointer value (valid or not) and also support integer arithmetic. uintptr_t is more suitable in this application because it supports mod 2^64 semantics while intptr_t has UB cases.

如注释中所建议,我们可以使用它来构建一个重载 operator [ ] 并执行范围检查:

As suggested in comments, we can use this to build a class that overloads operator[] and performs range checking:

#include <iostream> 
#include <assert.h>
#include <sstream> // for ostringstream
#include <vector>  // out_of_range
#include <cstdint> // uintptr_t
using namespace std;


// Safe non-zero-based array. Includes bounds checking.
template<typename Elem>
class Array {
  uintptr_t array; // base value for non-zero-based access
  int       lo;    // lowest valid index
  int       hi;    // highest valid index plus 1

public:

  Array(int lo, int hi)
    : array(), lo(lo), hi(hi)
  {
    if (lo > hi)
      {
        ostringstream msg; msg<<"Array(): lo("<<lo<<") > hi("<<hi<< ")";
        throw range_error(msg.str());
      }
    static_assert(sizeof(uintptr_t) == sizeof(void*),
          "Array: uintptr_t size does not match ptr size");
    static_assert(sizeof(ptrdiff_t) == sizeof(uintptr_t),
          "Array: ptrdiff_t size does not match ptr (efficieny issue)");
    Elem* alloc = new Elem[hi-lo];
    assert(alloc); // this is redundant; alloc throws bad_alloc
    array = (uintptr_t)(alloc) - (uintptr_t)(lo * sizeof(Elem));
    // Convert offset to unsigned to avoid overflow UB.
  }


  //////////////////////////////////////////////////////////////////
  // UNCHECKED access utilities (these method names start with "_").

  uintptr_t _get_array(){return array;}
  // Provide direct access to the base pointer (be careful!)

  Elem& _at(ptrdiff_t i)
  {return *(Elem*)(array + (uintptr_t)(i * sizeof(Elem)));}
  // Return reference to element (no bounds checking)
  // On GCC 5.4.0 with -O3, this compiles to an 'lea' instruction

  Elem* _get_alloc(){return &_at(lo);}
  // Return zero-based array that was allocated

  ~Array() {delete[](_get_alloc());}


  //////////////////////////////
  // SAFE access utilities

  Elem& at(ptrdiff_t i)
  {
    if (i < lo || i >= hi)
      {
        ostringstream msg;
        msg << "Array.at(): " << i << " is not in range ["
            << lo << ", " << hi << "]";
        throw out_of_range(msg.str());
      }
    return _at(i);
  }

  int get_lo() const {return lo;}
  int get_hi() const {return hi;}
  int size()   const {return hi - lo;}

  Elem& operator[](ptrdiff_t i){return at(i);}
  // std::vector is wrong; operator[] is the typical use and should be safe.
  // It's good practice to fix mistakes as we go along.

};


// Test
int main() 
{  
  const int LO = 1000000000;
  const int HI = LO + 10;
  Array<int> array(LO, HI);
  for (int i=LO; i<HI; i++)
    array[i] = i;
  for (int i=LO; i<HI; i++)
    cout << array[i] << "\n";
}

请注意,仍然无法转换由 intptr_t 指向指针类型,原因是 GCC 4.7数组和指针

Note that it is still not possible to cast the invalid "pointer" calculated by intptr_t to a pointer type, due to GCC 4.7 Arrays and Pointers:


从指针转换为整数然后再次返回时,生成的指针必须引用与原始指针相同的对象,否则行为未定义。也就是说,可能不会使用整数算术来避免C99和C11 6.5.6 / 8所禁止的指针算术的未定义行为。

When casting from pointer to integer and back again, the resulting pointer must reference the same object as the original pointer, otherwise the behavior is undefined. That is, one may not use integer arithmetic to avoid the undefined behavior of pointer arithmetic as proscribed in C99 and C11 6.5.6/8.

这就是为什么 array 字段必须为 intptr_t 类型而不是 Elem * 。换句话说,只要将 intptr_t 调整为指向原始对象,然后再转换回 Elem *

This is why the array field must be of type intptr_t and not Elem*. In other words, behavior is defined so long as the intptr_t is adjusted to point back to the original object before converting back to Elem*.

这篇关于C ++ gcc扩展用于基于非零的数组指针分配?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆