在char *值而不是在内存地址上的std :: hash值? [英] std::hash value on char* value and not on memory address?
问题描述
如该链接所述:
C字符串没有特殊化. std :: hash产生指针值(内存地址)的哈希,它不检查任何字符数组的内容.
There is no specialization for C strings. std::hash produces a hash of the value of the pointer (the memory address), it does not examine the contents of any character array.
这意味着使用相同的char*
值,可以生成不同的哈希码.例如,具有以下代码:
Which means that with the same char*
value, different hashcodes could be produced. For example, having this code:
//MOK and MOV are template arguments
void emit(MOK key, MOV value) {
auto h = hash<MOK>()(key);
cout<<"key="<<key<<" h="<<h<<endl;
...
这是通过在相同的key
(具有MOK=char*
)值(但有4个不同的标记/字符串对象)上调用4次emit()
产生的输出:
This is the output produced by calling 4 times emit()
on the same key
(with MOK=char*
) value (but 4 different tokens/string objects):
key=hello h=140311481289184
key=hello h=140311414180320
key=hello h=140311414180326
key=hello h=140311481289190
如何为char*
获取相同的哈希码?我不想使用boost
How can I obtain the same hash code for char*
? I'd prefer not to use boost
推荐答案
当然,存在创建临时std::string
并对其进行哈希处理的简单(缓慢)解决方案.如果您不想这样做,恐怕您将必须实现自己的哈希函数.令人遗憾的是,当前的C ++标准库没有提供与特定于对象的哈希解决方案分离的通用哈希算法. (但有些希望这是将来可能会改变.)
There is of course the trivial (and slow) solution of creating a temporary std::string
and hashing that one. If you don't want to do this, I'm afraid you will have to implement your own hash function. Sadly enough, the current C++ standard library doesn't provide general purpose hash algorithms disentangled from object-specific hash solutions. (But there is some hope this could change in the future.)
假设您有一个函数
std::size_t
hash_bytes(const void * data, std::size_t size) noexcept;
,它将采用一个地址和一个大小,并返回一个哈希值,该哈希值是根据该地址后的多个字节计算得出的.借助该功能,您可以轻松编写
that would take an address and a size and return you a hash computed from the that many bytes following that address. With the help of that function, you could easily write
template <typename T>
struct myhash
{
std::size_t
operator()(const T& obj) const noexcept
{
// Fallback implementation.
auto hashfn = std::hash<T> {};
return hashfn(obj);
}
};
然后将其专门用于您感兴趣的类型.
and then specialize it for the types you're interested in.
template <>
struct myhash<std::string>
{
std::size_t
operator()(const std::string& s) const noexcept
{
return hash_bytes(s.data(), s.size());
}
};
template <>
struct myhash<const char *>
{
std::size_t
operator()(const char *const s) const noexcept
{
return hash_bytes(s, std::strlen(s));
}
};
这只剩下实施hash_bytes
的练习.幸运的是,有一些相当不错的哈希函数,很容易实现.我的简单哈希算法是 Fowler-Noll -语音哈希功能.您可以用五行代码来实现它.请参阅链接的维基百科文章.
This leaves you only with the exercise of implementing hash_bytes
. Fortunately, there are some fairly good hash functions that are rather easy to implement. My go-to algorithm for simple hashing is the Fowler-Noll-Vo hash function. You can implement it in five lines of code; see the linked Wikipedia article.
如果您想花一点点时间,请考虑以下实现.首先,我定义一个通用的template
,可以将其专用于FNV-1a哈希函数的任何版本.
If you want to get a bit fancy, consider the following implementation. First, I define a generic template
that can be specialized for any version of the FNV-1a hash function.
template <typename ResultT, ResultT OffsetBasis, ResultT Prime>
class basic_fnv1a final
{
static_assert(std::is_unsigned<ResultT>::value, "need unsigned integer");
public:
using result_type = ResultT;
private:
result_type state_ {};
public:
constexpr
basic_fnv1a() noexcept : state_ {OffsetBasis}
{
}
constexpr void
update(const void *const data, const std::size_t size) noexcept
{
const auto cdata = static_cast<const unsigned char *>(data);
auto acc = this->state_;
for (auto i = std::size_t {}; i < size; ++i)
{
const auto next = std::size_t {cdata[i]};
acc = (acc ^ next) * Prime;
}
this->state_ = acc;
}
constexpr result_type
digest() const noexcept
{
return this->state_;
}
};
接下来,我提供32位和64位版本的别名.参数来自 Landon Curt Noll的网站.
Next, I provide aliases for the 32 and 64 bit versions. The parameters were taken from Landon Curt Noll's website.
using fnv1a_32 = basic_fnv1a<std::uint32_t,
UINT32_C(2166136261),
UINT32_C(16777619)>;
using fnv1a_64 = basic_fnv1a<std::uint64_t,
UINT64_C(14695981039346656037),
UINT64_C(1099511628211)>;
最后,在需要的位数的情况下,我提供类型元函数来选择算法的版本.
Finally, I provide type meta-functions to select a version of the algorithm given the wanted number of bits.
template <std::size_t Bits>
struct fnv1a;
template <>
struct fnv1a<32>
{
using type = fnv1a_32;
};
template <>
struct fnv1a<64>
{
using type = fnv1a_64;
};
template <std::size_t Bits>
using fnv1a_t = typename fnv1a<Bits>::type;
有了这个,我们很高兴.
And with that, we're good to go.
constexpr std::size_t
hash_bytes(const void *const data, const std::size_t size) noexcept
{
auto hashfn = fnv1a_t<CHAR_BIT * sizeof(std::size_t)> {};
hashfn.update(data, size);
return hashfn.digest();
}
请注意,此代码如何自动适应std::size_t
为32或64位宽的平台.
Note how this code automatically adapts to platforms where std::size_t
is 32 or 64 bits wide.
这篇关于在char *值而不是在内存地址上的std :: hash值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!