两个列表,在python中比较快 [英] two lists, faster comparison in python

查看:113
本文介绍了两个列表,在python中比较快的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写的python(2.7)脚本来比较两个列表。这些列表是通过读取其内容从文件创建的。文件只是文本文件,没有二进制。文件1只包含哈希(一些明文字的MD5和),文件2是哈希:plain。列表有不同的长度(逻辑,我可以有更少的'破解'条目比哈希),这两个不能排序,因为我必须保持顺序,但这是下一步我想要实现的。到目前为止我的简单代码如下:

I'm writting python (2.7) script to compare two lists. These lists are created from files by reading their content. Files are just text files, no binary. File 1 contains only hashes (MD5 sum of some plaintext word), File 2 is hash:plain. Lists have different length (logicaly, I can have less 'cracked' entries than hashes) and both can't be sorted as I have to preserve order, but this is a next step of what I'm trying to achieve. So far my simple code looks like that:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import sys
import os

def ifexists(fname):
    if not os.path.isfile(fname):
        print('[-] %s must exist' % fname)
        sys.exit(1)

if len(sys.argv) < 2:
    print('[-] please provide CRACKED and HASHES files')
    sys.exit(1)

CRACKED=sys.argv[1]
HASHES=sys.argv[2]

sk_ifexists(CRACKED)
sk_ifexists(HASHES)

with open(CRACKED) as cracked, open(HASHES) as hashes:
    hashdata=hashes.readlines()
    crackdata=cracked.readlines()
    for c in crackdata:
        for z in hashdata:
            if c.strip().split(':', 1)[0] in z:
                print('found: ', c.strip().split(':', 1))

基本上,我必须用HASHES列表中找到的哈希值替换为CRACKED列表中匹配的行哈希值。我迭代通过CRACKED,因为它会更短每次。所以我的问题是,上面的代码是非常慢的更长的列表。例如,使用60k行处理两个文本文件最多需要15分钟。

Basically, I have to replace hash found in HASHES list with matching line hash:plain found in CRACKED list. I'm iterating through CRACKED as it will be shorter every time. So my problem is that above code is very slow for longer lists. For example processing two text files with 60k lines is taking up to 15 minutes. What would be your suggestion to speed it up?

推荐答案

将其中一个文件存储在字典或集合中;

Store one of these files in a dictionary or set; that takes out a full loop and lookups are O(1) constant time on average.

例如,它看起来像 crackdata 文件可以很容易地转换为字典:

For example, it looks like the crackdata file can easily be converted to a dictionary:

with open(CRACKED) as crackedfile:
    cracked = dict(map(str.strip, line.split(':')) for line in crackedfile if ':' in line)

现在您只需要在一次之间循环其他文件:

and now you only have to loop over the other file once:

with open(HASHES) as hashes:
    for line in hashes:
        hash = line.strip()
        if hash in cracked:
            print('Found:', hash, 'which maps to', cracked[hash])

这篇关于两个列表,在python中比较快的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆