为什么在Cython中将列表转换成集合需要这么多时间? [英] why converting list into set in Cython takes so much time?

查看:99
本文介绍了为什么在Cython中将列表转换成集合需要这么多时间?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

huge_list参数类似于[[12,12,14],[43,356,23]].我将列表转换为设置的代码是:

huge_list parameter is something like [[12,12,14],[43,356,23]]. And my code to convert list to set is:

cpdef list_to_set(list huge_list):
    cdef list ids
    cdef list final_ids=[]
    for ids in huge_list:
        final_ids.append(set(ids))

    return final_ids

我有2800个列表元素,每个都有30,000个ID.大约需要19秒.如何提高性能?

I have 2800 list elements, each has 30,000 id. It takes around 19 second. How to improve performance?


代替set,我在numpy中使用了unique,如下所示,并且numpy的速度提高了约7秒:

EDIT 1:
Instead of set I used unique in numpy as below and numpy speeds up by ~7 seconds:

df['ids'] = df['ids'].apply(lambda x: numpy.unique(x))

现在需要14秒(以前是20秒).我认为这次还不能接受. :|

Now it takes 14 seconds (Previously it was ~20 seconds). I don't think this time is acceptable yet. :|

推荐答案

Cython无法加快任何速度.最多的时间是花在搭建家具上,例如计算元素的哈希值并将其存储在地图中.这已经在C语言中完成,因此无法加快速度.纯python版本:

Cython cannot speed up anything. The most time is spent building sets, e.g. calculating hash values of your elements and storing them in maps. This is already done in C, so no speed up possible. The pure python version:

final_ids = [set(ids) for ids in huge_list]

会导致相同的结果.

这篇关于为什么在Cython中将列表转换成集合需要这么多时间?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆