在巨大列表中查找/搜索的最有效方法(python) [英] Most efficient way for a lookup/search in a huge list (python)

查看:79
本文介绍了在巨大列表中查找/搜索的最有效方法(python)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

-我只是解析了一个大文件,并创建了一个包含42.000个字符串/单词的列表.我想查询[针对此列表]以检查给定的单词/字符串是否属于它.所以我的问题是:

-- I just parsed a big file and I created a list containing 42.000 strings/words. I want to query [against this list] to check if a given word/string belongs to it. So my question is:

最有效的查找方法是什么?

What is the most efficient way for such a lookup?

第一种方法是对列表(list.sort())进行排序,然后仅使用

A first approach is to sort the list (list.sort()) and then just use

>> if word in list: print 'word'

这确实是微不足道的,我相信有更好的方法来做到这一点.我的目标是应用快速查找来查找给定字符串是否在此列表中.如果您对其他数据结构有任何想法,欢迎使用.但是,我现在想避免使用更复杂的数据结构,例如Tries等.我对听到有关快速查找或任何其他python库方法的想法(或技巧)感兴趣,这些方法可能比简单的in做得更快.

which is really trivial and I am sure there is a better way to do it. My goal is to apply a fast lookup that finds whether a given string is in this list or not. If you have any ideas of another data structure, they are welcome. Yet, I want to avoid for now more sophisticated data-structures like Tries etc. I am interested in hearing ideas (or tricks) about fast lookups or any other python library methods that might do the search faster than the simple in.

我也想知道搜索项目的索引

And also i want to know the index of the search item

推荐答案

不要创建list,请创建set.它会在固定时间内进行查找.

Don't create a list, create a set. It does lookups in constant time.

如果您不希望集合的内存开销,则保留一个排序列表并使用

If you don't want the memory overhead of a set then keep a sorted list and search through it with the bisect module.

from bisect import bisect_left
def bi_contains(lst, item):
    """ efficient `item in lst` for sorted lists """
    # if item is larger than the last its not in the list, but the bisect would 
    # find `len(lst)` as the index to insert, so check that first. Else, if the 
    # item is in the list then it has to be at index bisect_left(lst, item)
    return (item <= lst[-1]) and (lst[bisect_left(lst, item)] == item)

这篇关于在巨大列表中查找/搜索的最有效方法(python)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆