在char *值而不是在内存地址上的std :: hash值? [英] std::hash value on char* value and not on memory address?

查看:203
本文介绍了在char *值而不是在内存地址上的std :: hash值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如该链接所述:

C字符串没有特殊化. std :: hash产生指针值(内存地址)的哈希,它不检查任何字符数组的内容.

There is no specialization for C strings. std::hash produces a hash of the value of the pointer (the memory address), it does not examine the contents of any character array.

这意味着使用相同的char*值,可以生成不同的哈希码.例如,具有以下代码:

Which means that with the same char* value, different hashcodes could be produced. For example, having this code:

//MOK and MOV are template arguments
void emit(MOK key, MOV value) {
    auto h = hash<MOK>()(key);
    cout<<"key="<<key<<" h="<<h<<endl;
    ...

这是通过在相同的key(具有MOK=char*)值(但有4个不同的标记/字符串对象)上调用4次emit()产生的输出:

This is the output produced by calling 4 times emit() on the same key (with MOK=char*) value (but 4 different tokens/string objects):

key=hello h=140311481289184
key=hello h=140311414180320
key=hello h=140311414180326
key=hello h=140311481289190

如何为char*获取相同的哈希码?我不想使用boost

How can I obtain the same hash code for char*? I'd prefer not to use boost

推荐答案

当然,存在创建临时std::string并对其进行哈希处理的简单(缓慢)解决方案.如果您不想这样做,恐怕您将必须实现自己的哈希函数.令人遗憾的是,当前的C ++标准库没有提供与特定于对象的哈希解决方案分离的通用哈希算法. (但有些希望这是将来可能会改变.)

There is of course the trivial (and slow) solution of creating a temporary std::string and hashing that one. If you don't want to do this, I'm afraid you will have to implement your own hash function. Sadly enough, the current C++ standard library doesn't provide general purpose hash algorithms disentangled from object-specific hash solutions. (But there is some hope this could change in the future.)

假设您有一个函数

std::size_t
hash_bytes(const void * data, std::size_t size) noexcept;

,它将采用一个地址和一个大小,并返回一个哈希值,该哈希值是根据该地址后的多个字节计算得出的.借助该功能,您可以轻松编写

that would take an address and a size and return you a hash computed from the that many bytes following that address. With the help of that function, you could easily write

template <typename T>
struct myhash
{
  std::size_t
  operator()(const T& obj) const noexcept
  {
    // Fallback implementation.
    auto hashfn = std::hash<T> {};
    return hashfn(obj);
  }
};

然后将其专门用于您感兴趣的类型.

and then specialize it for the types you're interested in.

template <>
struct myhash<std::string>
{
  std::size_t
  operator()(const std::string& s) const noexcept
  {
    return hash_bytes(s.data(), s.size());
  }
};

template <>
struct myhash<const char *>
{
  std::size_t
  operator()(const char *const s) const noexcept
  {
    return hash_bytes(s, std::strlen(s));
  }
};

这只剩下实施hash_bytes的练习.幸运的是,有一些相当不错的哈希函数,很容易实现.我的简单哈希算法是 Fowler-Noll -语音哈希功能.您可以用五行代码来实现它.请参阅链接的维基百科文章.

This leaves you only with the exercise of implementing hash_bytes. Fortunately, there are some fairly good hash functions that are rather easy to implement. My go-to algorithm for simple hashing is the Fowler-Noll-Vo hash function. You can implement it in five lines of code; see the linked Wikipedia article.

如果您想花一点点时间,请考虑以下实现.首先,我定义一个通用的template,可以将其专用于FNV-1a哈希函数的任何版本.

If you want to get a bit fancy, consider the following implementation. First, I define a generic template that can be specialized for any version of the FNV-1a hash function.

template <typename ResultT, ResultT OffsetBasis, ResultT Prime>
class basic_fnv1a final
{

  static_assert(std::is_unsigned<ResultT>::value, "need unsigned integer");

public:

  using result_type = ResultT;

private:

  result_type state_ {};

public:

  constexpr
  basic_fnv1a() noexcept : state_ {OffsetBasis}
  {
  }

  constexpr void
  update(const void *const data, const std::size_t size) noexcept
  {
    const auto cdata = static_cast<const unsigned char *>(data);
    auto acc = this->state_;
    for (auto i = std::size_t {}; i < size; ++i)
      {
        const auto next = std::size_t {cdata[i]};
        acc = (acc ^ next) * Prime;
      }
    this->state_ = acc;
  }

  constexpr result_type
  digest() const noexcept
  {
    return this->state_;
  }

};

接下来,我提供32位和64位版本的别名.参数来自 Landon Curt Noll的网站.

Next, I provide aliases for the 32 and 64 bit versions. The parameters were taken from Landon Curt Noll's website.

using fnv1a_32 = basic_fnv1a<std::uint32_t,
                             UINT32_C(2166136261),
                             UINT32_C(16777619)>;

using fnv1a_64 = basic_fnv1a<std::uint64_t,
                             UINT64_C(14695981039346656037),
                             UINT64_C(1099511628211)>;

最后,在需要的位数的情况下,我提供类型元函数来选择算法的版本.

Finally, I provide type meta-functions to select a version of the algorithm given the wanted number of bits.

template <std::size_t Bits>
struct fnv1a;

template <>
struct fnv1a<32>
{
  using type = fnv1a_32;
};

template <>
struct fnv1a<64>
{
  using type = fnv1a_64;
};

template <std::size_t Bits>
using fnv1a_t = typename fnv1a<Bits>::type;

有了这个,我们很高兴.

And with that, we're good to go.

constexpr std::size_t
hash_bytes(const void *const data, const std::size_t size) noexcept
{
  auto hashfn = fnv1a_t<CHAR_BIT * sizeof(std::size_t)> {};
  hashfn.update(data, size);
  return hashfn.digest();
}

请注意,此代码如何自动适应std::size_t为32或64位宽的平台.

Note how this code automatically adapts to platforms where std::size_t is 32 or 64 bits wide.

这篇关于在char *值而不是在内存地址上的std :: hash值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆