200kB文件中搜索8个! (40320排列)在Python或IDA中 [英] 200kB file to search for 8! (40320 permutations) in Python or IDA
问题描述
我正在拆卸固件(Siemens C165处理器- 但是,它什么也没找到 我认为序列会彼此相邻,但也许不会. 只想确保程序正确. 实际上,0-7应该是个半字节,这是否意味着我需要搜索,例如,作为一个组合: 以上是字节. 有人可以确认这一点并建议如何搜索吗? 更新1: 尝试过第二个变种 但也没有成功... 有什么主意我做错了吗? 您的代码中存在一些误解,并且效率很低.让我们从误解开始. 由于 在循环中,您只处理字符串: 希望您能看到为什么将字符串转换三次以获得所需的 您当前正在尝试遍历0-7的所有可能排列,而不是查找实际存在的排列.鉴于该文件只有200KB的大小,期望所有甚至大多数排列出现在其中都是不合理的.此外,您正在整个文件中搜索每个可能的排列.对于文件大小为 策略很简单.遍历 这里有很多可能的优化方法,使用numpy或scipy可以更有效地完成整个过程. 如果您允许序列中的重复,事情也会变得更加复杂.在这种情况下,您必须对序列进行排序: 如果您要搜索半字节,事情会变得更加复杂.我会完全放弃支票 在这里, I am working on disassembling a firmware (Siemens C165 processor - https://www.infineon.com/dgdl/Infineon-C165-DS-v02_00-en%5B8%5D.pdf?fileId=db3a304412b407950112b43a49a66fd7) in IDA. I have the firmware, so I can also read it via Python. I need to find a string being permutation of Wrote this simple program: However, it does not find anything I thought the sequence will be next to each other, but maybe not. Want to just be sure that the program is correct. Well actually, 0-7 should be nibbles, so does it mean I need to search for, for example as a one combination: Above is bytes. Can somebody confirm this and advise how to search for this? Update 1: Tried 2nd variant But also no hits ... Any ideas what I do wrong? There are a few misunderstandings in your code, and a major inefficiency. Let's start with the misunderstandings. Since Inside your loop, you deal exclusively with strings: Hopefully you can see why it is unnecessarily inefficient to convert a string three times to get the You are currently attempting to iterate through all the possible permutations of 0-7 rather than looking for the permutations that are actually there. Given that the file is only 200KB in size, it would be unreasonable to expect all or even most permutations to appear in it. Moreover, you are searching the entire file for each possible permutation. For a file size The strategy is simple. Iterate through There are a host of possible optimizations possible here, and you could probably do the whole thing way more efficiently with numpy or scipy. Things would also get more complicated if you allowed repeats in the sequence. In that case, you'd have to sort the sequences: If you are going to search for nibbles, things get more complicated still. I would drop entirely the check Here, 这篇关于200kB文件中搜索8个! (40320排列)在Python或IDA中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!0x01, 0x23, 0x45, 0x67
from itertools import permutations
l = list(permutations(range(0,8)))
print(len(l))
with open("firm.ori", 'rb') as f:
s = f.read()
for item in l:
str1 = ''.join(str(e) for e in item)
n = 2
out = [str1[i:i+n] for i in range(0, len(str1), n)]
hexstr = '\\x'.join(e for e in out)
hexstrfinal = "\\x" + hexstr
#print hexstrfinal
if(s.find(b'hexstrfinal')>0):
print "found"
firm.ori
以二进制模式(rb
)打开,因此s = f.read()
的结果是bytes
对象.尽管有与字符串相似的方法,但这不是字符串!它包含字节,而不是字符.当显示它时,\x...
输出并不表示bytes
对象包含ASCII反斜杠和xes.相反,每个\x...
是一个转义序列,用于表示给定字节的十六进制值,该十六进制值与ASCII可打印字符不对应.hexstr = '\\x'.join('{:02x}'.format(x) for x in i)
进行排列并将其格式化为bytes
对象的字符串表示形式.希望您从上一段中了解到为什么这行不通.s.find(b'hexstrfinal')
搜索文本ASCII数组b'hexstrfinal'
,而不搜索名为hexstrfinal
的变量.后者当然不起作用,因为hexstrfinal
具有类型str
,而不是bytes
.如果使用简单的hexstrfinal.encode('ascii')
将其转换为bytes
,则会得到b'\\x...'
,这根本不是您想要的.正确的方法是s.find(hexstrfinal.encode('ascii').decode('unicode-escape').encode('latin1'))
bytes
不必要地效率低下.每当您开始使用字符串作为处理数字的拐杖时,都是评估您的方法的好时机.这就开始了我们对您代码效率低下的讨论.N
和K
的排列,您的代码在O(N * K)
时间内运行,而可以一次通过文件或O(N)
来执行此操作.有了适当的数据结构,即使使用普通Python编写的循环也可能比当前代码的优化版本运行得更快.s
.如果当前字符和后面的七个字符组成有效的排列,请开始一个聚会.否则,请继续寻找:N = 8
allowed = set(range(N))
for index, b in enumerate(s):
if b in allowed and set(s[index:index + N]) == allowed:
print(f'Found sequence {s[index:index + N]} at offset {index}')
allowed = sorted(...)
N = len(allowed)
for index, b in enumerate(s):
if b in allowed and sorted(s[index:index + N]) == allowed:
print(f'Found sequence {s[index:index + N]} at offset {index}')
b in allowed
,只写一个可以在每半步应用的自定义支票:N = 8
def build_set(items):
output = set()
for b in items:
output.add(b & 0xF)
output.add((b >> 4) & 0xF)
return output
def matches(seq):
if len(seq) == N // 2:
return build_set(seq) == allowed
elif len(seq) == N // 2 + 1:
check = build_set(seq[1:-1])
check.add(seq[0] & 0xF)
check.add((seq[-1] >> 4) & 0xF)
return check == allowed
else:
return False
allowed = set(range())
for index, b in enumerate(s):
if matches(s[index:index + N // 2]):
print(f'Found sequence {s[index:index + N // 2]} at offset {index}.0')
if matches(s[index:index + N // 2 + 1]):
print(f'Found sequence {s[index:index + N // 2 + 1]]} at offset {index}.5')
build_set
只是将半字节分成一组. matches
检查按字节对齐的8个半字节数组(4个元素),或偏移半字节的8个半字节的数组(5个元素).两种情况均独立报告.0, 1, 2, 3, 4, 5, 6, 7 (0-7)
from itertools import permutations
l = list(permutations(range(0,8)))
print(len(l))
with open("firm.ori", 'rb') as f:
s = f.read()
for i in l:
hexstr = '\\x'.join('{:02x}'.format(x) for x in i)
hexstrfinal = "\\0x" + hexstr
#print hexstrfinal
if(s.find(b'hexstrfinal')>0):
print "found"
0x01, 0x23, 0x45, 0x67
from itertools import permutations
l = list(permutations(range(0,8)))
print(len(l))
with open("firm.ori", 'rb') as f:
s = f.read()
for item in l:
str1 = ''.join(str(e) for e in item)
n = 2
out = [str1[i:i+n] for i in range(0, len(str1), n)]
hexstr = '\\x'.join(e for e in out)
hexstrfinal = "\\x" + hexstr
#print hexstrfinal
if(s.find(b'hexstrfinal')>0):
print "found"
firm.ori
is opened in binary mode (rb
), the result of s = f.read()
is a bytes
object. Despite having similar methods to a string, this is not a string! It contains bytes, not characters. When you display it, the \x...
outputs do not indicate that the bytes
object containing ASCII backslashes and xes. Instead, each \x...
is an escape sequence used to represent the hex value of a given byte that does not correspond to an ASCII printable character.hexstr = '\\x'.join('{:02x}'.format(x) for x in i)
takes your permutation and formats it to look like the string representation of a bytes
object. Hopefully you understand from the previous paragraph why this won't work.s.find(b'hexstrfinal')
searches for the literal ASCII array b'hexstrfinal'
, not for the variable named hexstrfinal
. The latter wouldn't work of course, because hexstrfinal
has type str
, not bytes
. If you were to convert it to bytes
using a simple hexstrfinal.encode('ascii')
, you'd get b'\\x...'
, which is not at all what you want. The proper way would bes.find(hexstrfinal.encode('ascii').decode('unicode-escape').encode('latin1'))
bytes
you want. Any time you start using strings as a crutch to manipulate numbers is a good time to evaluate your approach. That begins our discussion of the inefficiencies in your code.N
and K
permutations, your code runs in O(N * K)
time, while it's possible to do this in a single pass through the file, or O(N)
. With the appropriate data structures, even a loop written in plain Python will likely run faster than an optimized version of the current code.s
. If the current character and the following seven make up a valid permutation, start a party. Otherwise, keep looking:N = 8
allowed = set(range(N))
for index, b in enumerate(s):
if b in allowed and set(s[index:index + N]) == allowed:
print(f'Found sequence {s[index:index + N]} at offset {index}')
allowed = sorted(...)
N = len(allowed)
for index, b in enumerate(s):
if b in allowed and sorted(s[index:index + N]) == allowed:
print(f'Found sequence {s[index:index + N]} at offset {index}')
b in allowed
, and just write a custom check that could be applied at every half-step:N = 8
def build_set(items):
output = set()
for b in items:
output.add(b & 0xF)
output.add((b >> 4) & 0xF)
return output
def matches(seq):
if len(seq) == N // 2:
return build_set(seq) == allowed
elif len(seq) == N // 2 + 1:
check = build_set(seq[1:-1])
check.add(seq[0] & 0xF)
check.add((seq[-1] >> 4) & 0xF)
return check == allowed
else:
return False
allowed = set(range())
for index, b in enumerate(s):
if matches(s[index:index + N // 2]):
print(f'Found sequence {s[index:index + N // 2]} at offset {index}.0')
if matches(s[index:index + N // 2 + 1]):
print(f'Found sequence {s[index:index + N // 2 + 1]]} at offset {index}.5')
build_set
just splits the nibbles into a set. matches
checks either an array of 8 nibbles aligned on a byte (4 elements), or an array of 8 nibbles offset by half a byte (5 elements). Both cases are reported independently.