Python正则表达式搜索十六进制字节 [英] Python regex search for hexadecimal bytes

查看:772
本文介绍了Python正则表达式搜索十六进制字节的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在二进制文件中搜索一系列十六进制值,但是,我遇到了一些我无法完全解决的问题. (1)我不确定如何搜索整个文件并返回所有匹配项.目前,f.seek仅能达到我认为的价值,这是不好的. (2)我想以可能会匹配的十进制或十六进制形式返回偏移量,尽管我每次都得到0,所以我不确定自己做错了什么.

I'm trying to search a binary file for a series of hexadecimal values, however, I've run into a few issues that I can't quite solve. (1) I'm not sure how to search the entire file and return all the matches. Currently I have f.seek going only as far as I think the value might be, which is no good. (2) I'd like to return the offset in either decimal or hex where there might be a match, although I get 0 each time, so I'm not sure what I did wrong.

example.bin

AA BB CC DD EE FF AB AC AD AE AF BA BB BC BD BE
BF CA CB CC CD CE CF DA DB DC DD DE DF EA EB EC

代码:

# coding: utf-8
import struct
import re

with open("example.bin", "rb") as f:
    f.seek(30)
    num, = struct.unpack(">H", f.read(2))
hexaPattern = re.compile(r'(0xebec)?')
m = re.search(hexaPattern, hex(num))
if m:
   print "found a match:", m.group(1)
   print " match offset:", m.start()

也许有更好的方法来完成所有这些工作?

Maybe there's a better way to do all this?

推荐答案

  1. 我不确定如何搜索整个文件并返回所有匹配项.
  2. 我想以十进制或十六进制返回偏移量

import re

f = open('data.txt', 'wb')
f.write('\xAA\xBB\xEB\xEC')
f.write('\xAA\xBB\xEB\xEC')
f.write('\xAA\xBB\xEB\xEC')
f.write('\xAA\xBB\xEB\xEC')
f.write('\xAA\xBB\xEB\xEC')
f.write('\xAA\xBB\xEB\xEC')
f.write('\xAA\xBB\xEB\xEC')
f.close()

f = open('data.txt', 'rb')
data = f.read()
f.close()

pattern = "\xEB\xEC"
regex = re.compile(pattern)

for match_obj in regex.finditer(data):
    offset = match_obj.start()
    print "decimal: {}".format(offset)
    print "hex(): " + hex(offset)
    print 'formatted hex: {:02X} \n'.format(offset)

--output:--
decimal: 2
hex(): 0x2
formatted hex: 02 

decimal: 6
hex(): 0x6
formatted hex: 06 

decimal: 10
hex(): 0xa
formatted hex: 0A 

decimal: 14
hex(): 0xe
formatted hex: 0E 

decimal: 18
hex(): 0x12
formatted hex: 12 

decimal: 22
hex(): 0x16
formatted hex: 16 

decimal: 26
hex(): 0x1a
formatted hex: 1A 

文件中的位置使用基于0的索引,例如列表.

The positions in the file use 0 based indexing like a list.

e.finditer(样式,字符串,标志= 0)
返回一个迭代器,在所有对象上产生MatchObject实例 字符串中RE模式的非重叠匹配.字符串是 从左到右扫描,并以找到的顺序返回匹配项.

e.finditer(pattern, string, flags=0)
Return an iterator yielding MatchObject instances over all non-overlapping matches for the RE pattern in string. The string is scanned left-to-right, and matches are returned in the order found.

匹配对象支持以下方法和属性:
开始([组])
结束([组])
返回开始和结束的索引 子字符串按组匹配;组默认为零(表示 整个匹配的子字符串).

Match objects support the following methods and attributes:
start([group])
end([group])
Return the indices of the start and end of the substring matched by group; group defaults to zero (meaning the whole matched substring).

https://docs.python.org/2/library/re.html

这篇关于Python正则表达式搜索十六进制字节的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆