将多个非数字插入到std :: unordered_set< double>中。 [英] Inserting multiple not-a-numbers into a std::unordered_set<double>
问题描述
IEEE 754标准的后果之一是 std :: unordered_set< double>
的非直观行为,当元素数量不是( NAN
s)。
One of consequences of the IEEE 754 standard is the non-intuitive behavior of std::unordered_set<double>
, when not-a-number elements (NAN
s) are inserted.
由于 NAN!= NAN
的事实,遵循以下顺序:
Due to the fact that NAN!=NAN
, after the following sequence:
#include <iostream>
#include <cmath>
#include <unordered_set>
int main(){
std::unordered_set<double> set;
set.insert(NAN);
set.insert(NAN);
std::cout<<"Number of elements "<<set.size()<<"\n"; //there are 2 elements!
}
集中有两个元素
(实时查看): NAN
和 NAN
!
我的主要问题是,当 N
NAN
被插入到哈希集中,它们都命中相同的哈希值,并且表现为 N
插入哈希集会退化为最坏的运行时间- O(N ^ 2)
。
Mine main issue with this is, that when N
NAN
s are inserted into the hash-set, they all hit the same hash-bucket and the performance of N
inserts into the hash-set degenerates to the worst-case running time - O(N^2)
.
例如,请参阅问题末尾的清单或在此处-插入 NAN
比普通浮点数要花更多的时间。
For an example, see the listing at the end of the question or here live - inserting NAN
takes some order of magnitude more time than a "normal" floating number.
我的问题:是否可以(如果是,如何)以这种方式调整 std :: unordered_set< double>
不论插入的 NAN
s( NAN
,- NAN
等)?
My question: is it possible (and if yes - how) to tweak std::unordered_set<double>
in such a way, that there is at most one NAN
-element in the set, no matter the flavor of inserted NAN
s (NAN
, -NAN
and so on)?
列表:
#include <iostream>
#include <cmath>
#include <unordered_set>
#include <chrono>
constexpr int N=5000;
void test_insert(double value)
{
std::unordered_set<double> s;
auto begin = std::chrono::high_resolution_clock::now();
for (int i = 0; i < N; i++) {
s.insert(value);
}
auto end = std::chrono::high_resolution_clock::now();
std::cout << "Duration: " << (std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin).count() / 1e9) << "\n";
std::cout << "Number of elements: "<<s.size()<<"\n";
}
int main(){
std::cout << "Not NAN\n";
test_insert(1.0); //takes 0.0001 s
std::cout << "NAN\n";
test_insert(NAN); //takes 0.2 s
}
推荐答案
从您在安德鲁斯答案中的评论,
From your comment in Andrews answer,
我认为此解决方案存在问题:-NAN的散列值不同于NAN,但对于散列-函数h必须成立:如果x == y,那么h(x)== h(y)
I think the problem with this solution: -NAN will have a different hash-value then NAN but for hash-function h must hold: if x==y then also h(x)==h(y)
,因此如果您想 h(-NAN)== h(NAN)
...
This does hash differently, so you need to also define your own hash function if you want h(-NAN) == h(NAN)
...
(来自@Andrew回答)
(augmented from @Andrew answer)
#include <iostream>
#include <cmath>
#include <unordered_set>
#include <chrono>
struct EqualPred
{
constexpr bool operator()(const double& lhs, const double& rhs) const
{
if (std::isnan(lhs) && std::isnan(rhs)) return true;
return lhs == rhs;
}
};
template <typename T>
struct Hash
{
size_t operator()(const T& value) const
{
return std::hash<T>()( std::isnan(value) ? NAN : value);
}
};
std::unordered_set<double, Hash<double>, EqualPred> s;
constexpr int N=5000;
void test_insert(double value)
{
auto begin = std::chrono::high_resolution_clock::now();
for (int i = 0; i < N; i++) {
s.insert(value);
}
auto end = std::chrono::high_resolution_clock::now();
std::cout << "Duration: " << (std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin).count() / 1e9) << "\n";
std::cout << "Number of elements: "<<s.size()<<"\n";
}
int main(){
std::cout << "Not NAN\n";
test_insert(1.0); //takes 0.0001 s
std::cout << "NAN\n";
test_insert(NAN);
test_insert(-NAN);
std::cout << s.size() << std::endl;
//takes 0.2 s
}
这篇关于将多个非数字插入到std :: unordered_set< double>中。的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!