为什么Python的比grep的慢? [英] Why is Python slower than grep?
问题描述
为什么要找的文件中的字符串时,Python比grep的慢?
我想要的是找到字符串,看到整个行字符串的位于
Why is Python slower than grep when looking for a string inside a file? All I want is to find the string and see the entire line the string's located on.
我的code(为了有充分的时间结果编辑的几个周期的东西):
My code (edited a few cycle things in order to have the full time result):
from __future__ import division
import argparse
import time
start_time = time.time()
# Argparse module
parser = argparse.ArgumentParser()
parser.add_argument("--verbose", "-v",
help="Print output.",
action="store_true")
parser.add_argument("--table", "--input", "-t", "-i",
help="The rainbow table containing hashes. (e.g., \"passwords.txt\"")
parser.add_argument("--hash",
help="The hash to be decrypted. (e.g., \"1bc29b36f623ba82aaf6724fd3b16718\")")
parser.add_argument("--hashlist", "--hl",
help="The list of hashes to be decrypted. (e.g., \"password_list.txt\")")
parser.add_argument("--output", "-o",
help="The path of the output file. (e.g., \"output.txt\")")
args = parser.parse_args()
# Functions
def log(path, data):
f = open(path, 'a')
f.write(data+"\n")
if args.verbose:
print(data)
f.close()
def unhash(table, hash, output, round=0, rounds=0):
with open(table) as f:
for i, l in enumerate(f):
pass
lines = i + 1
i = 0
with open(table) as f:
for line in f:
if args.verbose:
print(str(format((i/lines*100), '.5f'))+"%"),
if hash in line:
if args.verbose:
print("--HIT--: "),
log(output, line)
# break
else:
if args.verbose:
print("ROUND " + str(round) + " OUT OF " + str(rounds) + ".....")
i = i + 1
def parse_list(fname):
with open(fname) as f:
content = f.read().splitlines()
return content
output = args.output if args.output else "log.txt"
input = args.table if args.table else raw_input("Input: ")
if args.hashlist:
hashes = parse_list(args.hashlist)
for i in range(0, len(hashes)):
if args.verbose:
print("-----------------")
unhash(input, hashes[i], output, i, len(hashes))
else:
hash = args.hash if args.hash else raw_input("Hash: ")
unhash(input, hash, output)
print("Finished in " + str(time.time() - start_time) + " seconds.")
的grep code :grep的-nwMD5 /的1.txt-e密码
grep code: grep -nw 'md5/1.txt' -e "password"
Python的code时间:
完成了648.409528017秒。
Python code time: Finished in 648.409528017 seconds.
grep的时间(使用的时间来衡量的命令的)
Grep time (measured using time command):
- 真正0m0.334s
- 用户0m0.259s
- SYS 0m0.071s
该文件为345.5 MB大。
The file is 345.5 MB big.
推荐答案
到底是什么所有的Python code呢?
What the heck is all that Python code for?
此外,什么是 -r
在的grep
做的,如果目标不是一个目录?如果你只是想确保文件名包含在输出中,你应该使用 -H
。
Also, what is the -r
doing on the grep
if the target is not a directory? If you just want to make sure the filename is included in the output, you should use -H
.
一个Python的等价的grep -Hnw密码MD5 /的1.txt的
看起来更像是这样的:
A Python equivalent of grep -Hnw password md5/1.txt
looks more like this:
import re
filename = 'md5/1.txt'
pattern = re.compile(r'\bpassword\b')
n=0
for line in open(filename,'r'):
n = n+1
if pattern.match(line):
print(filename+":"+str(n)+":"+line.rstrip())
和我的机器,它比grep的4倍速度较慢,但是这仍然只有18毫秒VS 4ms的上。
And on my machine that's 4x slower than the grep, but that's still only 18ms vs 4ms.
这种差异的主要原因是的grep
为precompiled程序,而Python是PTED飞跨$ P $。即使它在技术上刚刚在时间precompiled以按字节code,这仍然有发生前的实际程序就可以开始运行额外的工作。
The main reason for the difference is that grep
is a precompiled program, whereas Python is interpreted on the fly. Even if it's technically just-in-time precompiled to byte-code, that's still extra work that has to happen before the actual program can start running.
这篇关于为什么Python的比grep的慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!