hash_set与Java的性能 [英] Performance of hash_set vs. Java

查看:71
本文介绍了hash_set与Java的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你好,


为了提高性能,我花了很多时间将我编写的Java程序转换为C ++,以提高性能,并且发现了它不一定比b $ b更快。


具体来说,我正在写一个电路模拟器,所以我必须首先解析

网表文件。该程序通过一个输入文件,并生成一个hash_set
的唯一节点(std :: string')。然后对单词进行排序和编号,将
复制到一个向量和排序中。然后,我在向量上运行一系列

二进制搜索,为每个

元素在矩阵中分配行。


我已经将STL'的lower_bound()修改为我的getIndex(),它的功能类似于

Java'相当于当元素不是时返回负索引

找到了。


此外,Java提供了一个HashSet类,它可以散列它可能包含的任何元素。因为,STL只散列选定的项目,我必须将我的

std :: strings转换为const char *来进行散列。


除此之外,我在Java和C ++中的代码非常相似。但是,对于Java来说,100k行文件的解析需要大约一秒钟,但对于C ++来说更像是17s

。实际上,使用std :: set而不是hash_set实际上在C ++中将性能提高到了大约15秒。


任何想法为什么会出现这样的差异?特别是,我怀疑

我的哈希函数从调用c_str()开始可能会很慢。这是否有

分配新内存或只是指向对象内的某个地方?我很好奇Java实际上用它的哈希值。任何一般的

对我的计划的批评/其他批评当然是受欢迎的。


从好的方面来看,有一个很好的直接稀疏线性求解器用于C.我'$

只能找到Java的迭代稀疏求解器。这使得整个C ++代码总体上比Java快得多,但我怀疑解析阶段

至少应该运行得快或者更快。

谢谢,


Alex Gerdemann

伊利诺伊大学厄巴纳 - 香槟分校


供参考,我已经包含了代码草图


所以我做了类似下面的事情:


struct hashSet {

__gnu_css :: hash< const char *> h;


bool operator()(std :: string& s)const {

return h(s.ctr());

}

};


inline int getIndex(const std :: vector< std :: string>& list,const std :: string&

item){

std :: vector< std :: string> :: const_iterator i =

std :: lower_bound(list .begin(),list.end(),item);

if(i == list.end()){

return - (static_cast< int>( i-list.begin())+ 1);

}否则if(* i!= item){

return - (static_cast< int>(i-list) .begin())+ 1);

}

返回static_cast< int>(i-list.begin());

}


__gnu_cxx :: hash_set< std :: string,hashString> nodeSet;

std :: vector< std :: string> nodeVector;


while(!done){

//读取一行文件

//将行拆分为单词并推入向量

//检查语法错误

nextNode =行中的某个单词;

nodeSet.insert(nextNode);

}

std :: copy(nodeSet.begin(),node.end(),std :: back_ins erter(nodeVector));

std :: sort(nodeVector.begin(),nodeVector.end());

while(!done){

//从先前存储的单词读取行

//更多错误检查

thisNode =行中的某个单词

getIndex(nodeVector,thisNode);

/ /分配一个表示包含正确索引的元素的对象

代表节点

//将元素推送到向量中

}

Hello,

I have spent a bunch of time converting a Java program I wrote to C++ in
order to improve performance, and have found that it is not necessarily
faster.

Specifically, I''m writing a circuit simulator, so I must first parse the
netlist file. The program goes through an input file, and makes a hash_set
of unique nodes (std::string''s). The words are then sorted and numbered by
copying the hash_set into a vector and sorting. Then, I run a series of
binary searches on the vector to allocate lines in a matrix for each
element.

I''ve modified STL''s lower_bound() to my getIndex() which functions like
Java''s equivalent by returning negative indices when the element is not
found.

Also, Java provides a HashSet class that can hash any kind of element it may
contain. Since, the STL only hashes selected items I had to convert my
std::strings to const char*''s for hashing.

Other than that, my code in the Java and C++ is quite similar. However, the
parse of a 100k line file took around a second for Java, but more like 17s
for C++ . In fact, using std::set instead of hash_set actually improved
performance to about 15s in C++.

Any idea why there would be such a difference? In particular, I suspect
that my hash function may be slow from the call to c_str(). Does this have
to allocate new memory or just point to somewhere inside the object? I''m
kind of curious what Java actually uses for its hash. Any general
performace/other criticisms on my scheme are certainly welcome.

On the plus side there is a great direct sparse linear solver for C. I''ve
only been able to find a iterative sparse solver for Java. This makes the
C++ code overall much faster than Java, but I suspect that the parse stage
should run at least as fast or faster.

Thanks,

Alex Gerdemann
University of Illinois at Urbana-Champaign

For reference, I''ve included a sketch of the code

So I did something like the following:

struct hashSet {
__gnu_css::hash<const char*> h;

bool operator()(std::string& s) const {
return h(s.ctr());
}
};

inline int getIndex(const std::vector<std::string>& list, const std::string&
item) {
std::vector<std::string>::const_iterator i =
std::lower_bound(list.begin(), list.end(), item);
if(i==list.end()) {
return -(static_cast<int>(i-list.begin())+1);
} else if(*i != item) {
return -(static_cast<int>(i-list.begin())+1);
}
return static_cast<int>(i-list.begin());
}

__gnu_cxx::hash_set<std::string, hashString> nodeSet;
std::vector<std::string> nodeVector;

while(!done) {
//read a line of file
//split the line into words and push into a vector
//check for syntax errors
nextNode = some word in line;
nodeSet.insert(nextNode);
}
std::copy(nodeSet.begin(),node.end(),std::back_ins erter(nodeVector));
std::sort(nodeVector.begin(),nodeVector.end());
while(!done) {
//read line from previously stored words
//more error checking
thisNode = some word in line
getIndex(nodeVector, thisNode);
//allocate an object representing element containing proper indices
representing nodes
//push pointer to element into a vector
}

推荐答案

Alex Gerdemann写道:
Alex Gerdemann wrote:
你好,
struct hashSet {
__gnu_css :: hash< const char *> ; h;

bool operator()(std :: string& s)const {
return h(s.ctr());
}
};

inline int getIndex(const std :: vector< std :: string>& list,const std :: string&
item){
std :: vector< std: :string> :: const_iterator i =
std :: lower_bound(list.begin(),list.end(),item);
if(i == list.end()){
return - (static_cast< int>(i-list.begin())+ 1);
} if if(* i!= item){
return - (static_cast< int>(i) -list.begin())+ 1);
}
返回static_cast< int>(i-list.begin());
}

__gnu_cxx: :hash_set< std :: string,hashString> nodeSet;
std :: vector< std :: string> nodeVector;
Hello,
struct hashSet {
__gnu_css::hash<const char*> h;

bool operator()(std::string& s) const {
return h(s.ctr());
}
};

inline int getIndex(const std::vector<std::string>& list, const std::string&
item) {
std::vector<std::string>::const_iterator i =
std::lower_bound(list.begin(), list.end(), item);
if(i==list.end()) {
return -(static_cast<int>(i-list.begin())+1);
} else if(*i != item) {
return -(static_cast<int>(i-list.begin())+1);
}
return static_cast<int>(i-list.begin());
}

__gnu_cxx::hash_set<std::string, hashString> nodeSet;
std::vector<std::string> nodeVector;



当你使用std :: string时,你将有*很多*字符串复制

正在进行中。 Java将只传递其refrece/指针。沿着。

尝试转换列表等来保存std :: string *(指针)而不是


When you use std::string , you will have *alot* of string copying
going on. Java will just pass its "refrece"/"pointer" along.
Try convert lists, etc. to hold std::string* (pointers )rather


11月11日2004年10月11:56:38 GMT,Alex Gerdemann

< nu ******* @ hotmail.com>写道:
On Mon, 11 Oct 2004 11:56:38 GMT, "Alex Gerdemann"
<nu*******@hotmail.com> wrote:
你好,
我花了很多时间将我编写的Java程序转换为C ++,以提高性能,已经发现它不一定更快。


不,除非你使用比

std :: string更好的类来保存字符串,否则它可能不会。 java.lang.String有几种方式

比std :: string的典型实现更有效。

具体来说,我正在写一个电路模拟器,所以我必须首先解析
网表文件。该程序通过一个输入文件,并生成一个hash_set
的唯一节点(std :: string')。然后通过将hash_set复制到向量和排序中对单词进行排序和编号。然后,我在向量上运行一系列的二进制搜索,为每个
元素在矩阵中分配行。

我修改了STL的lower_bound()到我的getIndex()函数就像
Java那样,当找不到元素时返回负索引。

此外,Java提供了一个可以哈希的HashSet类它可能含有的任何元素。因为,STL只散列选定的项目,我必须将我的
std :: strings转换为const char *来进行散列。


Java在散列中的主要好处是Strings缓存它们的

哈希码,所以它们只需要计算一次。 std :: string知道

没什么关于哈希的,所以没有执行这个

优化。

除此之外,我的代码在Java和C ++非常相似。但是,对于Java来说,100k行文件的解析花了大约一秒钟,但对于C ++来说更像17s
。事实上,使用std :: set而不是hash_set实际上在C ++中将性能提升到了大约15秒。


只需阅读文件多长时间?我不太了解

GCC'的hash_set,但我认为它可以配置为桶大小等。

您可能想要调整它。您使用的是什么版本的GCC?

任何想法为什么会有这样的差异?特别是,我怀疑从调用c_str()开始我的哈希函数可能会很慢。这有没有分配新的内存或只是指向对象内的某个地方?


由于引用计数问题,有时需要做前者,

但通常只返回指向存储的指针。跟踪使用

调试器会告诉你哈希调用中发生了什么。


我很好奇Java实际使用的是什么它的哈希。对我的计划有任何一般的表现/其他批评当然是受欢迎的。


Java只迭代整个String,生成哈希值,

,然后将其缓存以备将来调用。

开对于C,有一个很好的直接稀疏线性求解器。我只能找到Java的迭代稀疏求解器。这使得整个C ++代码总体上比Java快得多,但我怀疑解析阶段应该至少运行得快或者更快。

谢谢,
< Alex Gerdemann
伊利诺伊大学厄巴纳 - 香槟分校

作为参考,我已经包含了代码草图

所以我做了类似的事情以下内容:

struct hashSet {
__gnu_css :: hash< const char *> h;

bool operator()(std :: string& s)const {
return h(s.ctr());
}


我怀疑这是你的瓶颈,但如果字符串很长,那么

Java的哈希码缓存可能会给它带来一个很大的优势。

内联int getIndex(const std :: vector< std :: string>& list,const std :: string&
item){
std :: vector<}

; std :: string> :: const_iterator i =
std :: lower_bound(list.begin(),list.end(),item);
if(i == list.end()) {
返回 - (static_cast< int>(i-list.begin())+ 1);
} if if(* i!= item){
return - (static_cast< int> ;(i-list.begin())+ 1);
}
返回static_cast< int>(i-list.begin());
}
__ gnu_cxx :: hash_set< std :: string,hashString> nodeSet;
std :: vector< std :: string> nodeVector;

while(!done){
//读取一行文件
//将行拆分为单词并推入矢量
//检查用于语法错误
nextNode =行中的某个单词;
nodeSet.insert(nextNode);


以上代码可能是您的主要瓶颈所在。你如何阅读

这行,然后把它分成单词?如果你要创建一些

临时字符串,那将会减慢你的速度。您可以考虑使用一两个固定的向量并重复使用它们,以避免临时性。

}


这里你绝对想要:

nodeVector.reserve(nodeSet.size());

std :: copy(nodeSet.begin(),node.end(),std :: back_in serter (nodeVector));
std :: sort(nodeVector.begin(),nodeVector.end());


如果您使用的是参考

计算的字符串类,这两项操作都应该很快。我相信GCC 3+使用这样的野兽。

while(!done){
//从先前存储的单词读取行
//更多错误检查
thisNode =行中的某个单词
getIndex(nodeVector,thisNode);
//分配一个表示包含正确索引的元素的对象
表示节点
//将指向元素的指针推送到向量
}
Hello,

I have spent a bunch of time converting a Java program I wrote to C++ in
order to improve performance, and have found that it is not necessarily
faster.
No, it probably won''t be unless you use a better class than
std::string to hold the strings. java.lang.String is in several ways
more efficient than typical implementations of std::string.
Specifically, I''m writing a circuit simulator, so I must first parse the
netlist file. The program goes through an input file, and makes a hash_set
of unique nodes (std::string''s). The words are then sorted and numbered by
copying the hash_set into a vector and sorting. Then, I run a series of
binary searches on the vector to allocate lines in a matrix for each
element.

I''ve modified STL''s lower_bound() to my getIndex() which functions like
Java''s equivalent by returning negative indices when the element is not
found.

Also, Java provides a HashSet class that can hash any kind of element it may
contain. Since, the STL only hashes selected items I had to convert my
std::strings to const char*''s for hashing.
The main benefit Java has in hashing is that Strings cache their
hashcodes, so they need only be calculated once. std::string knows
nothing about hashing, and therefore doesn''t perform this
optimization.
Other than that, my code in the Java and C++ is quite similar. However, the
parse of a 100k line file took around a second for Java, but more like 17s
for C++ . In fact, using std::set instead of hash_set actually improved
performance to about 15s in C++.
How long did it take just to read in the file? I don''t know much about
GCC''s hash_set, but I assume it is configurable for bucket size, etc.
You might want to tweak this. What version of GCC are you using?
Any idea why there would be such a difference? In particular, I suspect
that my hash function may be slow from the call to c_str(). Does this have
to allocate new memory or just point to somewhere inside the object?
Sometimes is has to do the former, due to reference counting problems,
but usually it just returns a pointer to the storage. Tracing in using
the debugger would tell you what''s going on in the hash calls.

I''mkind of curious what Java actually uses for its hash. Any general
performace/other criticisms on my scheme are certainly welcome.
Java just iterates over the whole String, generating the hash value,
and then caches it for future calls.
On the plus side there is a great direct sparse linear solver for C. I''ve
only been able to find a iterative sparse solver for Java. This makes the
C++ code overall much faster than Java, but I suspect that the parse stage
should run at least as fast or faster.

Thanks,

Alex Gerdemann
University of Illinois at Urbana-Champaign

For reference, I''ve included a sketch of the code

So I did something like the following:

struct hashSet {
__gnu_css::hash<const char*> h;

bool operator()(std::string& s) const {
return h(s.ctr());
}
I doubt that is your bottleneck, but if the strings are long, then
Java''s hashcode caching might be giving it a major advantage.
};

inline int getIndex(const std::vector<std::string>& list, const std::string&
item) {
std::vector<std::string>::const_iterator i =
std::lower_bound(list.begin(), list.end(), item);
if(i==list.end()) {
return -(static_cast<int>(i-list.begin())+1);
} else if(*i != item) {
return -(static_cast<int>(i-list.begin())+1);
}
return static_cast<int>(i-list.begin());
}

__gnu_cxx::hash_set<std::string, hashString> nodeSet;
std::vector<std::string> nodeVector;

while(!done) {
//read a line of file
//split the line into words and push into a vector
//check for syntax errors
nextNode = some word in line;
nodeSet.insert(nextNode);
The above code may be where your main bottleneck is. How do you read
the line and then split it into words? If you are creating a number of
temporary strings, that will be slowing you down. You might consider
using a fixed vector or two and reusing them, to avoid temporaries.
}
Here you definitely want:
nodeVector.reserve(nodeSet.size());
std::copy(nodeSet.begin(),node.end(),std::back_in serter(nodeVector));
std::sort(nodeVector.begin(),nodeVector.end());
Both of those operations should be fast if you are using a reference
counted string class. I believe GCC 3+ uses such a beast.
while(!done) {
//read line from previously stored words
//more error checking
thisNode = some word in line
getIndex(nodeVector, thisNode);
//allocate an object representing element containing proper indices
representing nodes
//push pointer to element into a vector
}




总的来说,我认为你最好的选择是对代码的较小位进行基准测试,或者通过分析器进行测试(-gprof IIRC,我可能不会这么做),找出瓶颈究竟在哪里。


Tom



Overall, I think your best bet would be to benchmark smaller bits of
the code or put it through a profiler (-gprof IIRC, which I probably
don''t), to find out where the bottleneck actually lies.

Tom


" Alex Gerdemann" < NU ******* @ hotmail.com>在消息中写道

新闻:Wxuad.369933
"Alex Gerdemann" <nu*******@hotmail.com> wrote in message
news:Wxuad.369933


这篇关于hash_set与Java的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆