是否有可能在可移植C ++ 03代码中散列指针? [英] Is it possible to hash pointers in portable C++03 code?

查看:146
本文介绍了是否有可能在可移植C ++ 03代码中散列指针?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有可能在C ++ 03中散列一个指针,它没有定义 std :: hash ?



看起来真的很奇怪,因为在C ++中包含的指针是不可能的,但我想不出有什么办法让它们成为它。



我能想到的最接近的方式是做 reinterpret_cast< uintptr_t>(ptr),但是 uintptr_t 是不需要在C ++ 03中定义,并且我不确定该值是否可以合法操作,即使它定义了 ...这甚至可能吗?

解决方案

不,一般。事实上,如果没有 std :: hash ,通常在C ++ 11中是不可能的。



在于值表示之间的区别。您可能还记得用于演示值与其表示之间不同:空指针值。许多人错误地认为这个值的表示都是零。这不以任何方式保证。只保证行为的价值。



再举一个例子,请考虑:

  int i; 
int * x =& i;
int * y =& i;

x == y; // 这是真的;这两个指针值是相等的

然而,在 x y 可以不同!

播放编译器。我们将实现指针的值表示。假设我们需要(假设体系结构的原因)指针至少有两个字节,但只有一个用于该值。



我只是跳到说它可能是这样的:

  struct __pointer_impl 
{
std :: uint8_t byte1; //包含我们持有的地址
std :: uint8_t byte2; //由于架构原因,未使用
//(假定没有填充;毕竟我们是编译器)
};

好的,这是我们的值表示,现在让我们实现值语义。首先,等于:

  bool operator ==(const __pointer_impl& first,const __pointer_impl& second)
{
返回first.byte1 == second.byte1;





$ b

因为指针的值实际上只包含在第一个字节中(即使它的表示有两个字节),这就是我们必须比较的。第二个字节是不相关的,即使它们不同



当然,我们需要操作符的地址:

  __ pointer_impl address_of(int& i)
{
__pointer_impl result;

result.byte1 = / *虚构建筑魔法* /;

返回结果;



$ b $ p
$ b

这个特殊的实现重载让我们得到一个给定的 INT 。请注意,第二个字节保持未初始化!没关系:对于并不重要。



这实际上是我们需要将这点提升到家的全部。假装完成其余的部分。 :)



现在再次考虑我们的第一个例子,编译器化:

  int i; 

/ * int * x =& i; * /
__pointer_impl x = __address_of(i);

/ * int * y =& i; * /
__pointer_impl y = __address_of(i);

x == y; // 这是真的;这两个指针值是相等的

对于我们假设架构中的一个小例子,这足以提供所需的保证按照指针值的标准。但请注意,您绝对不能保证 x == y 暗示 memcmp(& x,& y,sizeof(__ pointer_impl))== 0 。对于值表达式,没有任何要求。



现在考虑您的问题:我们如何哈希指针?也就是说,我们要实现:

  template< typename T> 
struct myhash;

模板< typename T>
struct myhash< T *> :
std :: unary_function< T *,std :: size_t>
{
std :: size_t operator()(T * const ptr)const
{
return / * ??? * /;
}
};

最重要的要求是,如果 x == y ,然后 myhash()(x)== myhash()(y)。我们也已经知道如何散列整数。我们能做什么?



我们可以做的事情是尝试以某种方式将指针转换为整数。那么,C ++ 11给了我们 std :: uintptr_t ,所以我们可以做到这一点,对吧?

  return myhash< std :: uintptr_t>()(reinterpret_cast< std :: uintptr_t>(ptr)); 

也许令人惊讶的是,这是不正确的。为了理解为什么,想象一下我们正在实现它:

  //好,因为我们假设没有填充:
typedef std :: uint16_t __uintptr_t; //将用于std :: uintptr_t实现

__uintptr_t __to_integer(const __pointer_impl& ptr)
{
__uintptr_t result;
std :: memcpy(& result,& ptr,sizeof(__ uintptr_t));

返回结果;
}

__pointer_impl __from_integer(const __uintptr_t& ptrint)
{
__pointer_impl result;
std :: memcpy(& result,& ptrint,sizeof(__ pointer_impl));

返回结果;

所以当我们 reinterpret_cast 一个指向整数的指针,我们将使用 __ to_integer ,然后返回,我们将使用 __ from_integer 。请注意,生成的整数的值将取决于指针值表示中的位。也就是说,两个相同的指针值可能以不同的整数表示形式结束......这是允许的!



这是允许的,因为 reinterpret_cast 完全是实现定义的;你只能保证得到相反的结果 reinterpret_cast 会得到相同的结果。



所以有第一个问题:在这个实现上,我们的哈希可能会以相同的指针值结束。



这个想法已经结束。也许我们可以到达表示本身并将这些字节散列在一起。但是,这显然最终会出现同样的问题,这就是你的问题的评论所暗指的。这些烦人的未使用的表示位总是在阻碍之中,因此无法确定它们的位置,因此我们可以忽略它们。

我们卡住了!这是不可能的。



请记住,实际上我们为某些实现编译,并且由于这些操作的结果是实现定义的,如果你注意只使用它们,它们是可靠的。这就是 Mats Petersson在说:找出执行的保证,你会没事的。



事实上,您使用的大多数消费者平台都会处理 std :: uintptr_t 这种尝试。如果它在系统中不可用,或者如果您需要其他方法,只需将指针中各个字节的散列组合起来即可。所有这些都需要工作,即未使用的表示位总是具有相同的值。实际上,这是MSVC2012使用的方法!



如果我们假设的指针实现始终将 byte2 初始化为一个常量,它也会在那里工作。但是,实现这一点并没有任何要求。



希望这可以澄清一些事情。


Is it possible to portably hash a pointer in C++03, which does not have std::hash defined?

It seems really weird for hashables containing pointers to be impossible in C++, but I can't think of any way of making them.

The closest way I can think of is doing reinterpret_cast<uintptr_t>(ptr), but uintptr_t is not required to be defined in C++03, and I'm not sure if the value could be legally manipulated even if it was defined... is this even possible?

解决方案

No, in general. In fact it's not even possible in general in C++11 without std::hash.

The reason why lies in the difference between values and value representations.

You may recall the very common example used to demonstrate the different between a value and its representation: the null pointer value. Many people mistakenly assume that the representation for this value is all bits zero. This is not guaranteed in any fashion. You are guaranteed behavior by its value only.

For another example, consider:

int i;
int* x = &i;
int* y = &i;

x == y;  // this is true; the two pointer values are equal

Underneath that, though, the value representation for x and y could be different!

Let's play compiler. We'll implement the value representation for pointers. Let's say we need (for hypothetical architecture reasons) the pointers to be at least two bytes, but only one is used for the value.

I'll just jump ahead and say it could be something like this:

struct __pointer_impl
{
    std::uint8_t byte1; // contains the address we're holding
    std::uint8_t byte2; // needed for architecture reasons, unused
    // (assume no padding; we are the compiler, after all)
};

Okay, this is our value representation, now lets implement the value semantics. First, equality:

bool operator==(const __pointer_impl& first, const __pointer_impl& second)
{
    return first.byte1 == second.byte1;
}

Because the pointer's value is really only contained in the first byte (even though its representation has two bytes), that's all we have to compare. The second byte is irrelevant, even if they differ.

We need the address-of operator implementation, of course:

__pointer_impl address_of(int& i)
{
    __pointer_impl result;

    result.byte1 = /* hypothetical architecture magic */;

    return result;
}

This particular implementation overload gets us a pointer value representation for a given int. Note that the second byte is left uninitialized! That's okay: it's not important for the value.

This is really all we need to drive the point home. Pretend the rest of the implementation is done. :)

So now consider our first example again, "compiler-ized":

int i;

/* int* x = &i; */
__pointer_impl x = __address_of(i);

/* int* y = &i; */
__pointer_impl y = __address_of(i);

x == y;  // this is true; the two pointer values are equal

For our tiny example on the hypothetical architecture, this sufficiently provides the guarantees required by the standard for pointer values. But note you are never guaranteed that x == y implies memcmp(&x, &y, sizeof(__pointer_impl)) == 0. There simply aren't requirements on the value representation to do so.

Now consider your question: how do we hash pointers? That is, we want to implement:

template <typename T>
struct myhash;

template <typename T>
struct myhash<T*> :
    std::unary_function<T*, std::size_t>
{
    std::size_t operator()(T* const ptr) const
    {
        return /* ??? */;
    }
};

The most important requirement is that if x == y, then myhash()(x) == myhash()(y). We also already know how to hash integers. What can we do?

The only thing we can do is try to is somehow convert the pointer to an integer. Well, C++11 gives us std::uintptr_t, so we can do this, right?

return myhash<std::uintptr_t>()(reinterpret_cast<std::uintptr_t>(ptr));

Perhaps surprisingly, this is not correct. To understand why, imagine again we're implementing it:

// okay because we assumed no padding:
typedef std::uint16_t __uintptr_t; // will be used for std::uintptr_t implementation

__uintptr_t __to_integer(const __pointer_impl& ptr)
{
    __uintptr_t result;
    std::memcpy(&result, &ptr, sizeof(__uintptr_t));

    return result;
}

__pointer_impl __from_integer(const __uintptr_t& ptrint)
{
    __pointer_impl result;
    std::memcpy(&result, &ptrint, sizeof(__pointer_impl));

    return result;
}

So when we reinterpret_cast a pointer to integer, we'll use __to_integer, and going back we'll use __from_integer. Note that the resulting integer will have a value depending upon the bits in the value representation of pointers. That is, two equal pointer values could end up with different integer representations...and this is allowed!

This is allowed because the result of reinterpret_cast is totally implementation-defined; you're only guaranteed the resulting of the opposite reinterpret_cast gives you back the same result.

So there's the first issue: on this implementation, our hash could end up different for equal pointer values.

This idea is out. Maybe we can reach into the representation itself and hash the bytes together. But this obviously ends up with the same issue, which is what the comments on your question are alluding to. Those pesky unused representation bits are always in the way, and there's no way to figure out where they are so we can ignore them.

We're stuck! It's just not possible. In general.

Remember, in practice we compile for certain implementations, and because the results of these operations are implementation-defined they are reliable if you take care to only use them properly. This is what Mats Petersson is saying: find out the guarantees of the implementation and you'll be fine.

In fact, most consumer platforms you use will handle the std::uintptr_t attempt just fine. If it's not available on your system, or if you want an alternative approach, just combine the hashes of the individual bytes in the pointer. All this requires to work is that the unused representation bits always take on the same value. In fact, this is the approach MSVC2012 uses!

Had our hypothetical pointer implementation simply always initialized byte2 to a constant, it would work there as well. But there just isn't any requirement for implementations to do so.

Hope this clarifies a few things.

这篇关于是否有可能在可移植C ++ 03代码中散列指针?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆