我如何安全和明智地确定指针是否指向某个指定的缓冲区? [英] How do I safely and sensibly determine whether a pointer points somewhere into a specified buffer?

查看:91
本文介绍了我如何安全和明智地确定指针是否指向某个指定的缓冲区?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想实现一个函数来确定给定的指针是否指向一个给定的缓冲区。规范:

I'm looking to implement a function that determines whether a given pointer points into a given buffer. The specification:

template <typename T>
bool points_into_buffer (T *p, T *buf, std::size_t len);

如果有一些 n c $ c> 0< = n&& n < len , p == buf + n ,返回 true

If there is some n, 0 <= n && n < len, for which p == buf + n, returns true.

否则,如果有一些 n 0 <= n& n < len * sizeof(T),其中 reinterpret_cast< char *>(p)== reinterpret_cast< char *>(buf)+ n ,则该行为未定义。

Otherwise, if there is some n, 0 <= n && n < len * sizeof(T), for which reinterpret_cast<char *>(p) == reinterpret_cast<char *>(buf) + n, the behaviour is undefined.

否则返回 false

显而易见的实现将类似于

The obvious implementation would look something like

template <typename T>
bool points_into_buffer (T *p, T *buf, std::size_t len) {
    return p >= buf && p < buf + len;
}

但在标准C ++中有未定义的行为:只定义指针的关系比较

but that has undefined behaviour in standard C++: relational comparisons of pointers are only defined for pointers into the same array.

另一种方法是使用标准库的comparer对象:

An alternative would be to use the standard library's comparer objects:

template <typename T>
bool points_into_buffer (T *p, T *buf, std::size_t len) {
    return std::greater_equal<T *>()(p, buf) && std::less<T *>()(p, buf + len);
}

确保返回 true 当我想它返回 true ,并避免未定义的行为,但允许假阳性:给定 int a; int b; ,它允许 true 的结果为 points_into_buffer(& a,& b,1) / code>。

which is guaranteed to return true when I want it to return true, and avoids undefined behaviour, but allows for false positives: given int a; int b;, it allows a result of true for points_into_buffer(&a, &b, 1).

它可以实现为一个循环:

It can be implemented as a loop:

template <typename T>
bool points_into_buffer (T *p, T *buf, std::size_t len) {
    for (std::size_t i = 0; i != len; i++)
        if (p == buf + i)
            return true;
    return false;
}

但是,编译器无法优化该循环。

However, compilers have trouble optimising away that loop.

是否有一个有效的写法,在当前编译器和优化启用,结果是在常量时间确定?

Is there a valid way of writing this, where with current compilers and optimisations enabled, the result is determined in constant time?

推荐答案

据我所知,这是一个可移植的实现我所有可能的实现后的函数:

As far as I can tell, this is a portable implementation of the function I'm after for all possible implementations:

#ifdef UINTPTR_MAX

bool points_into_buffer(std::uintptr_t p, std::uintptr_t buf, std::size_t len)
{
  const auto diff = p + 0u - buf;
  if (diff < len)
    // #1
    if (reinterpret_cast<char *>(p) == reinterpret_cast<char *>(buf) + diff)
      return true;
  for (std::size_t n = 0; n != len; n++)
    if (reinterpret_cast<char *>(p) == reinterpret_cast<char *>(buf) + n)
      // #2
      if (reinterpret_cast<char *>(p) - reinterpret_cast<char *>(buf) != diff)
        return true;
  return false;
}

template <typename T>
bool points_into_buffer(T *p, T *buf, std::size_t len)
{
  return points_into_buffer(reinterpret_cast<std::uintptr_t>(p),
                            reinterpret_cast<std::uintptr_t>(buf),
                            len * sizeof(T));
}

#else

template <typename T>
bool points_into_buffer(T *p, T *buf, std::size_t len)
{
  for (std::size_t n = 0; n != len; n++)
    if (p == buf + n)
      return true;
  return false;
}

#endif

一般来说, diff 不能保证有一个有意义的值。但是没关系:当且仅当它找到一些 n 时,函数返回 true ,使得 reinterpret_cast< char *>(p)== reinterpret_cast< char *>(buf)+ n 。它只使用 diff 作为提示来更快地找到 n 的值。

In general, diff is not guaranteed to have a meaningful value. But that's okay: the function returns true if and only if it finds some n such that reinterpret_cast<char *>(p) == reinterpret_cast<char *>(buf) + n. It only uses diff as a hint to find the value of n faster.

它依赖于编译器优化条件,这些条件在编译时一般不一定是已知的,但在编译时对于特定平台是已知的。标记为#1 #2 if 语句的条件c>分别由GCC在编译时确定为 true false c> diff 被定义,允许GCC看到在循环内没有执行有用的动作,并允许删除整个循环。

It relies on the compiler optimising conditions that are not necessarily known at compile time in general, but are known at compile time for a particular platform. The conditions for the if statements marked as #1 and #2 are determined by GCC at compile time to always be true and false respectively, because of how diff is defined, allowing GCC to see that no useful action is performed inside the loop, and allowing the entire loop to be dropped.

points_into_buffer< char> points_into_buffer< int> 的生成代码如下:


bool points_into_buffer(char*, char*, unsigned int):
        movl    4(%esp), %edx
        movl    $1, %eax
        movl    12(%esp), %ecx
        subl    8(%esp), %edx
        cmpl    %edx, %ecx
        ja      L11
        xorl    %eax, %eax
L11:    rep ret

bool points_into_buffer(int*, int*, unsigned int):
        movl    4(%esp), %edx
        movl    12(%esp), %eax
        subl    8(%esp), %edx
        leal    0(,%eax,4), %ecx
        movl    $1, %eax
        cmpl    %edx, %ecx
        ja      L19
        xorl    %eax, %eax
L19:    rep ret

std :: uintptr_t 不可用,或者地址比简单整数更复杂的系统上,将使用循环。

On systems where std::uintptr_t is not available, or where addresses are more complicated than simple integers, the loop is used instead.

这篇关于我如何安全和明智地确定指针是否指向某个指定的缓冲区?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆