Python并行哈希蛮力 [英] Python Parallel hash bruteforce

查看：69 发布时间：2020/5/13 20:02:32 python parallel-processing multiprocessing

本文介绍了Python并行哈希蛮力的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我从另一个用户在此站点上编写的脚本中修改了此脚本.我正在尽我所能来理解它，但是遇到了困难.当我尝试使用字符集(仅作为小写字母)破解哈希时，它工作得很好.但是，当我尝试破解其中包含数字和字母的哈希时，除非将"spc"变量从1000000增加到100000000，否则它将无法正常工作.

I have this script modified from a different script written by another user on this site. I'm trying to understand it the best I can, but am having difficulty. When I attempt to crack a hash using the charset as maybe only lowercase letters, it works just fine. But when I try cracking hash that has numbers and letters in it, it won't work unless I increase the "spc" variable from 1000000 to 100000000.

import itertools
import math
import string
import multiprocessing
import hashlib
import traceback
import sys

def hashstring(string, algorithm):
    return hashlib.new(algorithm, string).hexdigest()

def gen_product(prefix, charset, length):
    for string in itertools.product(charset, repeat=length):
        yield prefix + "".join(string)

def string_generator(prefix, hash, suffix_length, length, charset, hashalg):
    num_done = 0
    if length <= suffix_length:
        assert prefix == ""
        for possible in gen_product("", charset, length):
            if hashstring(possible, hashalg) == hash:
                return possible
    else:
        assert len(prefix) + suffix_length == length
        for possible in gen_product(prefix, charset, suffix_length):
            if hashstring(possible, hashalg) == hash:
                return possible

    return None

def run_string_generator(*args):
    try:
        return string_generator(*args)
    except:
        raise Exception("".join(traceback.format_exception(*sys.exc_info())))

def do_work(pool, hash, charset, length, hashalg, spc=100000000):
    n = len(charset)
    suffix_len = int(math.ceil(math.log(spc) / math.log(n)) - 1)

    max_short_len = min(suffix_len, length)
    for length in range(1, max_short_len + 1):
        result = pool.apply_async(run_string_generator, args = ("", hash, suffix_len, length, charset, hashalg))
        if result.get() != None:
            return result.get()
    for length in range(max_short_len + 1, length + 1):
        for prefix in gen_product("", charset, length - suffix_len):
            result = pool.apply_async(run_string_generator, args = (prefix, hash, suffix_len, length, charset, hashalg))    
            if result.get() != None:
                return result.get()

    return None


def parallel_bruteforce(hash, charset, length, hashalg="md5", spc=1000000, cores=None):
    pool = multiprocessing.Pool(cores)
    result = do_work(pool, hash, charset, length, hashalg, spc)

    pool.close()
    pool.join()

    return result

if __name__ == "__main__":
    print "Starting..."
    #The hash is an md5 encryption of "test1"
    print parallel_bruteforce("5a105e8b9d40e1329780d62ea2265d8a", string.ascii_lowercase +  string.digits, 5, spc=100000000)

带有原始代码的另一篇文章的链接为 https://stackoverflow.com/a/20135250/1769995

The link to the other post with the original code is https://stackoverflow.com/a/20135250/1769995

推荐答案

对不起，我现在没有时间对此进行解释.这是对我先前答案的修改，保留了并行性，并在哈希被破解时尽早"阻止了所有工作程序.通常，您不希望传递在调用之间永不变化的参数，因此在模块级别，我在这里所做的工作比您做的要多得多.随附的代码显示:

Sorry in advance that I don't have time now to explain this. It's an edit of my earlier answer that retains the parallelism and stops all the workers "early" if the hash is cracked. In general, you do not want to pass arguments that never vary across invocations, so I do much more here at module level than you do. The attached code displays:

workers will cycle through the last 3 chars
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
[etc]
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
the plaintext is 'test1'

是的.请注意，它通过实现PAIRS(字母表中所有2个字符的字符串的列表)而发挥了另一个低级效率的技巧.从长远来看，这再次节省了数十亿的冗余连接.

Here it is. Note that it plays another low-level efficiency trick, by materializing PAIRS, a list of all 2-character strings from the alphabet. That again saves countless billions of redundant joins over the course of a long run.

import string
import hashlib
from itertools import product

CHARSET = string.ascii_lowercase +  string.digits
MAX_LENGTH = 5
NUM_PROCESSES = None # defaults to all available cores

HASHALG = "md5"
HASH = "5a105e8b9d40e1329780d62ea2265d8a"

PAIRS = ["".join(t) for t in product(CHARSET, repeat=2)]

def make_bases(count):
    bases = [PAIRS] * (count // 2)
    if count & 1:
        bases.insert(0, CHARSET)
    return bases

# string_gen is what the workers run.  Everything else
# runs in the main program.
def string_gen(prefix, suffix_len, length):
    # Generate all strings of length `length` starting with `prefix`.
    # If length > suffix_len, only the last suffix_len characters
    # need to be generated.
    if length <= suffix_len:
        assert prefix == ""
        bases = make_bases(length)
    else:
        assert len(prefix) + suffix_len == length
        bases = make_bases(suffix_len)
    for t in product(*bases):
        result = prefix + "".join(t)
        # do something with result
        if hashlib.new(HASHALG, result).hexdigest() == HASH:
            return result

def record_done(result):
    global all_done, the_secret
    print ".",
    if result is not None:
        print
        the_secret = result
        all_done = True
        pool.close()
        pool.terminate() # stop all workers! we're done

def do_work(pool, strings_per_chunk=1000000):
    global all_done, the_secret
    all_done = False
    the_secret = None
    # What's the most chars we can cycle through without
    # exceeding strings_per_chunk?
    N = len(CHARSET)
    suffix_len = 1
    while N**suffix_len <= strings_per_chunk:
        suffix_len += 1
    suffix_len -= 1
    print "workers will cycle through the last", suffix_len, "chars"

    # There's no point to splitting up very short strings.
    max_short_len = min(suffix_len, MAX_LENGTH)
    for length in range(1, max_short_len + 1):
        pool.apply_async(string_gen, args=("", suffix_len, length),
                         callback=record_done)
        if all_done:
            return
    # And now the longer strings.
    for length in range(max_short_len + 1, MAX_LENGTH + 1):
        for t in product(*make_bases(length - suffix_len)):
            prefix = "".join(t)
            pool.apply_async(string_gen, args=(prefix, suffix_len, length),
                             callback=record_done)
            if all_done:
                return

if __name__ == "__main__":
    import multiprocessing
    pool = multiprocessing.Pool(NUM_PROCESSES)
    do_work(pool)
    pool.close()
    pool.join()
    if the_secret is None:
        print "didn't crack it!"
    else:
        print "the plaintext is", repr(the_secret)

注意:按照编写的要求，该代码并行"用于较大的问题大小和/或非常小的strings_per_chunk.主程序可以触发apply_async()调用的速度远远快于工作进程处理它们的速度，因此multiprocessing机器最终可能试图排队数十亿个工作项.然后，您可能会用完RAM或其他系统资源.当然，这也是可以解决的；-)

CAUTION: As written, that code is "too parallel" for larger problem sizes and/or very small strings_per_chunk. The main program can fire off apply_async() calls far faster than the worker processes can deal with them, so the multiprocessing machinery can end up trying to queue billions of work items. Then you can run out of RAM, or other system resources. Of course that's fixable too ;-)

multiprocessing没有公开任何方法来限制其内部队列，因此自然"的解决方案是使用我们自己的队列添加一个层.这样在multiprocessing的内部任务队列中每个处理器最多可以保留3个待处理任务，但是只要其自己的队列变得更长，就可以阻止主程序生成更多的前缀.还弄乱了哈希被破解后引发EarlyExit异常的逻辑；这比使用全局标志进行处理更容易和更清洁.以下内容是要替换从record_done()向下的所有内容:

multiprocessing doesn't expose any ways to throttle its internal queues, so "a natural" solution is to add a layer with our own queue. This keeps up to 3 pending tasks per processor in multiprocessing's internal task queue, but blocks the main program from generating more prefixes so long as its own queue gets longer than that. Also fiddled the logic to raise an EarlyExit exception when the hash has been cracked; this is easier and cleaner than mucking with global flags. What follows is meant to replace everything above from record_done() down:

class EarlyExit(Exception):
    def __init__(self, result):
        Exception.__init__(self)
        self.result = result

class CustomDispatcher:
    def __init__(self, pool):
        from collections import deque
        self.pool = pool
        self.q = deque()

    def queue_work(self, *args):
        while len(self.q) > NUM_PROCESSES * 3:
            # provided the workers have significant work to do,
            # it will "take a long time" to finish the work
            # already queued.  Rather than swamp the mp machinery
            # with even more pending tasks, wait for some to
            # finish first.
            self.unqueue()
        self.q.append(self.pool.apply_async(string_gen, args))

    def unqueue(self):
        if self.q:
            # note:  the main program spends most of its time
            # blocked on the .get(); that's because it can
            # generate prefixes far faster than workers can
            # process them
            result = self.q.popleft().get()
            print ".",
            if result is not None:
                print
                raise EarlyExit(result)

    def drain(self):
        while self.q:
            self.unqueue()

def do_work(dispatch, strings_per_chunk=10000000):
    # What's the most chars we can cycle through without
    # exceeding strings_per_chunk?
    N = len(CHARSET)
    suffix_len = 1
    while N**suffix_len <= strings_per_chunk:
        suffix_len += 1
    suffix_len -= 1
    print "workers will cycle through the last", suffix_len, "chars"
    print "so each dot represents", \
          format(len(CHARSET)**suffix_len, ","), "strings"

    # There's no point to splitting up very short strings.
    max_short_len = min(suffix_len, MAX_LENGTH)
    for length in range(1, max_short_len + 1):
        dispatch.queue_work("", suffix_len, length)
    # And now the longer strings.
    for length in range(max_short_len + 1, MAX_LENGTH + 1):
        for t in product(*make_bases(length - suffix_len)):
            dispatch.queue_work("".join(t), suffix_len, length)
    dispatch.drain()  # check remaining tasks for a winner

if __name__ == "__main__":
    import multiprocessing
    pool = multiprocessing.Pool(NUM_PROCESSES)
    dispatch = CustomDispatcher(pool)
    try:
        do_work(dispatch)
    except EarlyExit as e:
        print "the plaintext is", repr(e.result)
    else:
        print "didn't crack it!"
    pool.close()
    pool.terminate() # stop all workers! we're done
    pool.join()

随着字母的大小和/或生成的最长字符串的大小增加，可能性的组合爆炸可能意味着您将永远等待结果，但是至少有了这一更改，您不会用完RAM-并且您将以接近100％的容量利用所有内核.

The combinatorial explosion of possibilities as the size of the alphabet and/or size of the longest string generated increase may mean you'll wait forever for a result, but at least with this change you won't run out of RAM - and you'll utilize all your cores at close to 100% capacity.

这篇关于Python并行哈希蛮力的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python并行哈希蛮力 [英] Python Parallel hash bruteforce

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python并行哈希蛮力 [英] Python Parallel hash bruteforce

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭