200kB文件中搜索8个! (40320排列)在Python或IDA中 [英] 200kB file to search for 8! (40320 permutations) in Python or IDA

查看:118
本文介绍了200kB文件中搜索8个! (40320排列)在Python或IDA中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在拆卸固件(Siemens C165处理器-

但是,它什么也没找到

我认为序列会彼此相邻,但也许不会.

只想确保程序正确.

实际上,0-7应该是个半字节,这是否意味着我需要搜索,例如,作为一个组合:

0x01, 0x23, 0x45, 0x67 

以上是字节.

有人可以确认这一点并建议如何搜索吗?

更新1:

尝试过第二个变种

from itertools import permutations 
l = list(permutations(range(0,8))) 
print(len(l))

with open("firm.ori", 'rb') as f:
  s = f.read()
for item in l:
  str1 = ''.join(str(e) for e in item)
  n = 2
  out = [str1[i:i+n] for i in range(0, len(str1), n)]

  hexstr = '\\x'.join(e for e in out)
  hexstrfinal  = "\\x" + hexstr 
  #print hexstrfinal
  if(s.find(b'hexstrfinal')>0):
    print "found"

但也没有成功...

有什么主意我做错了吗?

解决方案

您的代码中存在一些误解,并且效率很低.让我们从误解开始.

由于firm.ori以二进制模式(rb)打开,因此s = f.read()的结果是bytes对象.尽管有与字符串相似的方法,但这不是字符串!它包含字节,而不是字符.当显示它时,\x...输出并不表示bytes对象包含ASCII反斜杠和xes.相反,每个\x...是一个转义序列,用于表示给定字节的十六进制值,该十六进制值与ASCII可打印字符不对应.

在循环中,您只处理字符串:hexstr = '\\x'.join('{:02x}'.format(x) for x in i)进行排列并将其格式化为bytes对象的字符串表示形式.希望您从上一段中了解到为什么这行不通.

s.find(b'hexstrfinal')搜索文本ASCII数组b'hexstrfinal',而不搜索名为hexstrfinal的变量.后者当然不起作用,因为hexstrfinal具有类型str,而不是bytes.如果使用简单的hexstrfinal.encode('ascii')将其转换为bytes,则会得到b'\\x...',这根本不是您想要的.正确的方法是

s.find(hexstrfinal.encode('ascii').decode('unicode-escape').encode('latin1'))

希望您能看到为什么将字符串转换三次以获得所需的bytes不必要地效率低下.每当您开始使用字符串作为处理数字的拐杖时,都是评估您的方法的好时机.这就开始了我们对您代码效率低下的讨论.

您当前正在尝试遍历0-7的所有可能排列,而不是查找实际存在的排列.鉴于该文件只有200KB的大小,期望所有甚至大多数排列出现在其中都是不合理的.此外,您正在整个文件中搜索每个可能的排列.对于文件大小为NK的排列,您的代码在O(N * K)时间内运行,而可以一次通过文件或O(N)来执行此操作.有了适当的数据结构,即使使用普通Python编写的循环也可能比当前代码的优化版本运行得更快.

策略很简单.遍历s.如果当前字符和后面的七个字符组成有效的排列,请开始一个聚会.否则,请继续寻找:

N = 8
allowed = set(range(N))
for index, b in enumerate(s):
    if b in allowed and set(s[index:index + N]) == allowed:
        print(f'Found sequence {s[index:index + N]} at offset {index}')

这里有很多可能的优化方法,使用numpy或scipy可以更有效地完成整个过程.

如果您允许序列中的重复,事情也会变得更加复杂.在这种情况下,您必须对序列进行排序:

allowed = sorted(...)
N = len(allowed)
for index, b in enumerate(s):
    if b in allowed and sorted(s[index:index + N]) == allowed:
        print(f'Found sequence {s[index:index + N]} at offset {index}')

如果您要搜索半字节,事情会变得更加复杂.我会完全放弃支票b in allowed,只写一个可以在每半步应用的自定义支票:

N = 8

def build_set(items):
    output = set()
    for b in items:
        output.add(b & 0xF)
        output.add((b >> 4) & 0xF)
    return output

def matches(seq):
    if len(seq) == N // 2:
        return build_set(seq) == allowed
    elif len(seq) == N // 2 + 1:
        check = build_set(seq[1:-1])
        check.add(seq[0] & 0xF)
        check.add((seq[-1] >> 4) & 0xF)
        return check == allowed
    else:
        return False

allowed = set(range())
for index, b in enumerate(s):
    if matches(s[index:index + N // 2]):
        print(f'Found sequence {s[index:index + N // 2]} at offset {index}.0')
     if matches(s[index:index + N // 2 + 1]):
        print(f'Found sequence {s[index:index + N // 2 + 1]]} at offset {index}.5')

在这里,build_set只是将半字节分成一组. matches检查按字节对齐的8个半字节数组(4个元素),或偏移半字节的8个半字节的数组(5个元素).两种情况均独立报告.

I am working on disassembling a firmware (Siemens C165 processor - https://www.infineon.com/dgdl/Infineon-C165-DS-v02_00-en%5B8%5D.pdf?fileId=db3a304412b407950112b43a49a66fd7) in IDA.

I have the firmware, so I can also read it via Python.

I need to find a string being permutation of

0, 1, 2, 3, 4, 5, 6, 7 (0-7)

Wrote this simple program:

from itertools import permutations 
l = list(permutations(range(0,8))) 
print(len(l))

with open("firm.ori", 'rb') as f:
    s = f.read()
for i in l:
    hexstr = '\\x'.join('{:02x}'.format(x) for x in i)
    hexstrfinal = "\\0x" + hexstr
    #print hexstrfinal
    if(s.find(b'hexstrfinal')>0):
        print "found"

However, it does not find anything

I thought the sequence will be next to each other, but maybe not.

Want to just be sure that the program is correct.

Well actually, 0-7 should be nibbles, so does it mean I need to search for, for example as a one combination:

0x01, 0x23, 0x45, 0x67 

Above is bytes.

Can somebody confirm this and advise how to search for this?

Update 1:

Tried 2nd variant

from itertools import permutations 
l = list(permutations(range(0,8))) 
print(len(l))

with open("firm.ori", 'rb') as f:
  s = f.read()
for item in l:
  str1 = ''.join(str(e) for e in item)
  n = 2
  out = [str1[i:i+n] for i in range(0, len(str1), n)]

  hexstr = '\\x'.join(e for e in out)
  hexstrfinal  = "\\x" + hexstr 
  #print hexstrfinal
  if(s.find(b'hexstrfinal')>0):
    print "found"

But also no hits ...

Any ideas what I do wrong?

解决方案

There are a few misunderstandings in your code, and a major inefficiency. Let's start with the misunderstandings.

Since firm.ori is opened in binary mode (rb), the result of s = f.read() is a bytes object. Despite having similar methods to a string, this is not a string! It contains bytes, not characters. When you display it, the \x... outputs do not indicate that the bytes object containing ASCII backslashes and xes. Instead, each \x... is an escape sequence used to represent the hex value of a given byte that does not correspond to an ASCII printable character.

Inside your loop, you deal exclusively with strings: hexstr = '\\x'.join('{:02x}'.format(x) for x in i) takes your permutation and formats it to look like the string representation of a bytes object. Hopefully you understand from the previous paragraph why this won't work.

s.find(b'hexstrfinal') searches for the literal ASCII array b'hexstrfinal', not for the variable named hexstrfinal. The latter wouldn't work of course, because hexstrfinal has type str, not bytes. If you were to convert it to bytes using a simple hexstrfinal.encode('ascii'), you'd get b'\\x...', which is not at all what you want. The proper way would be

s.find(hexstrfinal.encode('ascii').decode('unicode-escape').encode('latin1'))

Hopefully you can see why it is unnecessarily inefficient to convert a string three times to get the bytes you want. Any time you start using strings as a crutch to manipulate numbers is a good time to evaluate your approach. That begins our discussion of the inefficiencies in your code.

You are currently attempting to iterate through all the possible permutations of 0-7 rather than looking for the permutations that are actually there. Given that the file is only 200KB in size, it would be unreasonable to expect all or even most permutations to appear in it. Moreover, you are searching the entire file for each possible permutation. For a file size N and K permutations, your code runs in O(N * K) time, while it's possible to do this in a single pass through the file, or O(N). With the appropriate data structures, even a loop written in plain Python will likely run faster than an optimized version of the current code.

The strategy is simple. Iterate through s. If the current character and the following seven make up a valid permutation, start a party. Otherwise, keep looking:

N = 8
allowed = set(range(N))
for index, b in enumerate(s):
    if b in allowed and set(s[index:index + N]) == allowed:
        print(f'Found sequence {s[index:index + N]} at offset {index}')

There are a host of possible optimizations possible here, and you could probably do the whole thing way more efficiently with numpy or scipy.

Things would also get more complicated if you allowed repeats in the sequence. In that case, you'd have to sort the sequences:

allowed = sorted(...)
N = len(allowed)
for index, b in enumerate(s):
    if b in allowed and sorted(s[index:index + N]) == allowed:
        print(f'Found sequence {s[index:index + N]} at offset {index}')

If you are going to search for nibbles, things get more complicated still. I would drop entirely the check b in allowed, and just write a custom check that could be applied at every half-step:

N = 8

def build_set(items):
    output = set()
    for b in items:
        output.add(b & 0xF)
        output.add((b >> 4) & 0xF)
    return output

def matches(seq):
    if len(seq) == N // 2:
        return build_set(seq) == allowed
    elif len(seq) == N // 2 + 1:
        check = build_set(seq[1:-1])
        check.add(seq[0] & 0xF)
        check.add((seq[-1] >> 4) & 0xF)
        return check == allowed
    else:
        return False

allowed = set(range())
for index, b in enumerate(s):
    if matches(s[index:index + N // 2]):
        print(f'Found sequence {s[index:index + N // 2]} at offset {index}.0')
     if matches(s[index:index + N // 2 + 1]):
        print(f'Found sequence {s[index:index + N // 2 + 1]]} at offset {index}.5')

Here, build_set just splits the nibbles into a set. matches checks either an array of 8 nibbles aligned on a byte (4 elements), or an array of 8 nibbles offset by half a byte (5 elements). Both cases are reported independently.

这篇关于200kB文件中搜索8个! (40320排列)在Python或IDA中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆