为什么Python设置交集比Rust HashSet交集快? [英] Why is Python set intersection faster than Rust HashSet intersection?

查看:98
本文介绍了为什么Python设置交集比Rust HashSet交集快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的Python代码:

Here is my Python code:

len_sums = 0
for i in xrange(100000):
    set_1 = set(xrange(1000))
    set_2 = set(xrange(500, 1500))
    intersection_len = len(set_1.intersection(set_2))
    len_sums += intersection_len
print len_sums

这是我的Rust代码:

Here is my Rust code:

use std::collections::HashSet;

fn main() {
    let mut len_sums = 0;
    for _ in 0..100000 {
        let set_1: HashSet<i32> = (0..1000).collect();
        let set_2: HashSet<i32> = (500..1500).collect();
        let intersection_len = set_1.intersection(&set_2).count();
        len_sums += intersection_len;
    }
    println!("{}", len_sums);
}

我相信这些大致相同.我得到以下性能结果:

I believe these are roughly equivalent. I get the following performance results:

time python set_performance.py
50000000

real    0m11.757s
user    0m11.736s
sys 0m0.012s

rustc set_performance.rs -O       
time ./set_performance 50000000

real    0m17.580s
user    0m17.533s
sys 0m0.032s

使用cargo--release生成的结果相同.

Building with cargo and --release give the same result.

我意识到Python的set是用C实现的,因此预计会很快,但是我没想到它会比Rust快.它不需要做Rust不会做的额外类型检查吗?

I realize that Python's set is implemented in C, and so is expected to be fast, but I did not expect it to be faster than Rust. Wouldn't it have to do extra type checking that Rust would not?

也许我在编译Rust程序时缺少一些东西,还有其他我应该使用的优化标志吗?

Perhaps I'm missing something in the way I compile my Rust program, are there any other optimizations flags that I should be using?

另一种可能性是代码不是真正等效的,并且Rust正在做不必要的额外工作,我有什么遗漏吗?

Another possibility is that the code is not really equivalent, and Rust is doing unnecessary extra work, am I missing anything?

Python版本:

In [3]: import sys

In [4]: sys.version
Out[4]: '2.7.6 (default, Jun 22 2015, 17:58:13) \n[GCC 4.8.2]'

锈版本

$ rustc --version
rustc 1.5.0 (3d7cd77e4 2015-12-04)

我正在使用Ubuntu 14.04,我的系统架构是x86_64.

I am using Ubuntu 14.04 and my system architecture is x86_64.

推荐答案

当我将集合构建移出循环并仅重复交集时,对于这两种情况,Rust都比Python 2.7快.

When I move the set-building out of the loop and only repeat the intersection, for both cases of course, Rust is faster than Python 2.7.

我只读过Python 3 (setobject.c) ,但是Python的实现有很多用处.

I've only been reading Python 3 (setobject.c), but Python's implementation has some things going for it.

它使用了两个Python集合对象都使用相同的哈希函数的事实,因此它不会重新计算哈希. Rust HashSet具有用于其哈希函数的实例唯一键,因此在交集期间,它们必须将一个集中的键与另一组的哈希函数重新哈希.

It uses the fact that both Python set objects use the same hash function, so it does not recompute the hash. Rust HashSets have instance-unique keys for their hash functions, so during intersection they must rehash keys from one set with the other set's hash function.

另一方面,Python必须为每个匹配的哈希调用动态键比较函数,例如PyObject_RichCompareBool,而Rust代码使用泛型,并将专门化i32的哈希函数和比较代码.在Rust中对i32进行哈希处理的代码看起来相对便宜,并且删除了大部分哈希算法(处理输入时间超过4个字节).

On the other hand, Python must call out to a dynamic key comparison function like PyObject_RichCompareBool for each matching hash, while the Rust code uses generics and will specialize the hash function and comparison code for i32. The code for hashing an i32 in Rust looks relatively cheap, and much of the hashing algorithm (handling longer input than 4 bytes) is removed.

似乎是Python和Rust分开的集合的构造.实际上,不仅是构造,还有一些重要的代码正在运行以破坏Rust HashSet. (可以对此进行改进,在此处提交错误:#31711 )

It appears it's the construction of the sets that sets Python and Rust apart. And in fact not just construction, there's some significant code running to destruct the Rust HashSets as well. (This can be improved, filed bug here: #31711)

这篇关于为什么Python设置交集比Rust HashSet交集快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆