为什么Python的比grep的慢? [英] Why is Python slower than grep?

查看:376
本文介绍了为什么Python的比grep的慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为什么要找的文件中的字符串时,Python比grep的慢?
我想要的是找到字符串,看到整个行字符串的位于

Why is Python slower than grep when looking for a string inside a file? All I want is to find the string and see the entire line the string's located on.

我的code(为了有充分的时间结果编辑的几个周期的东西):

My code (edited a few cycle things in order to have the full time result):

from __future__ import division
import argparse
import time
start_time = time.time()

# Argparse module
parser = argparse.ArgumentParser()
parser.add_argument("--verbose", "-v",
                    help="Print output.",
                    action="store_true")
parser.add_argument("--table", "--input", "-t", "-i",
                    help="The rainbow table containing hashes. (e.g., \"passwords.txt\"")
parser.add_argument("--hash",
                    help="The hash to be decrypted. (e.g., \"1bc29b36f623ba82aaf6724fd3b16718\")")
parser.add_argument("--hashlist", "--hl",
                    help="The list of hashes to be decrypted. (e.g., \"password_list.txt\")")
parser.add_argument("--output", "-o",
                    help="The path of the output file. (e.g., \"output.txt\")")
args = parser.parse_args()

# Functions
def log(path, data):
    f = open(path, 'a')
    f.write(data+"\n")
    if args.verbose:
        print(data)
    f.close()

def unhash(table, hash, output, round=0, rounds=0):
    with open(table) as f:
        for i, l in enumerate(f):
            pass
        lines = i + 1
    i = 0
    with open(table) as f:
        for line in f:
            if args.verbose:
                print(str(format((i/lines*100), '.5f'))+"%"),
            if hash in line:
                if args.verbose:
                    print("--HIT--: "),
                log(output, line)
                # break
            else:
                if args.verbose:
                    print("ROUND " + str(round) + " OUT OF " + str(rounds) + ".....")
            i = i + 1

def parse_list(fname):
    with open(fname) as f:
        content = f.read().splitlines()
    return content

output = args.output if args.output else "log.txt"
input = args.table if args.table else raw_input("Input: ")

if args.hashlist:
    hashes = parse_list(args.hashlist)
    for i in range(0, len(hashes)):
        if args.verbose:
            print("-----------------")
        unhash(input, hashes[i], output, i, len(hashes))
else:
    hash = args.hash if args.hash else raw_input("Hash: ")
    unhash(input, hash, output)
print("Finished in " + str(time.time() - start_time) + " seconds.")

的grep code :grep的-nwMD5 /的1.txt-e密码

grep code: grep -nw 'md5/1.txt' -e "password"

Python的code时间:
完成了648.409528017秒。

Python code time: Finished in 648.409528017 seconds.

grep的时间(使用的时间来衡量的命令

Grep time (measured using time command):


  • 真正0m0.334s

  • 用户0m0.259s

  • SYS 0m0.071s

该文件为345.5 MB大。

The file is 345.5 MB big.

推荐答案

到底是什么所有的Python code呢?

What the heck is all that Python code for?

此外,什么是 -r 的grep 做的,如果目标不是一个目录?如果你只是想确保文件名包含在输出中,你应该使用 -H

Also, what is the -r doing on the grep if the target is not a directory? If you just want to make sure the filename is included in the output, you should use -H.

一个Python的等价的grep -Hnw密码MD5 /的1.txt的看起来更像是这样的:

A Python equivalent of grep -Hnw password md5/1.txt looks more like this:

import re
filename = 'md5/1.txt'
pattern = re.compile(r'\bpassword\b')
n=0
for line in open(filename,'r'):
  n = n+1
  if pattern.match(line):
    print(filename+":"+str(n)+":"+line.rstrip())

和我的机器,它比grep的4倍速度较慢,但​​是这仍然只有18毫秒VS 4ms的上。

And on my machine that's 4x slower than the grep, but that's still only 18ms vs 4ms.

这种差异的主要原因是的grep 为precompiled程序,而Python是PTED飞跨$ P $。即使它在技术上刚刚在时间precompiled以按字节code,这仍然有发生前的实际程序就可以开始运行额外的工作。

The main reason for the difference is that grep is a precompiled program, whereas Python is interpreted on the fly. Even if it's technically just-in-time precompiled to byte-code, that's still extra work that has to happen before the actual program can start running.

这篇关于为什么Python的比grep的慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆