如何读取二进制文件中的特定十六进制值 [英] How do I read specific hex values in binary files

查看:168
本文介绍了如何读取二进制文件中的特定十六进制值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好。



我想在二进制文件中搜索特定的十六进制数字/值。例如,我想在program.exe中搜索5c 6d 69 6e 67 77 33 32 5c(仅举例)。如果这些值按照他们现在的确切顺序在二进制文件中找到,我希望它们放在一个字符串中,这样我就可以显示它们。



我目前不喜欢没有任何代码,因为它对我来说很混乱。我为此道歉。



提前谢谢!

Hello.

I'd like to search for specific hexadecimal numbers/values in a binary file. For example I want to search program.exe for "5c 6d 69 6e 67 77 33 32 5c" (just an example). If these values in the exact order they're in now are found in the binary, I want them put in a string so I can display them.

I currently don't have any code because it's quite confusing for me. I apologize for that.

Thanks in advance!

推荐答案

是的,你非常困惑基础。无需道歉 - 过去很多人都遇到过这种情况,但这是一个非常大的错误观念;没有解雇它,你就不能认真地继续计算。



首先,数字不能是十六进制或十进制。 这是表示数字的字符串属性,而不是数字本身。这些数字都是二进制。更确切地说,大多数使用数字的技术都保持不可知以精确计算机表示数字;这真的很好,因为它使软件更便携。每次你假设数字的特定表示时,你都会牺牲可移植性。



尽管如此,在很多情况下你需要假设数字的特定二进制表示。这就是整数值的表示方式(几乎在所有系统上): http://en.wikipedia.org/wiki/Two%27s_complement (可能并不像你想象的那么简单,但它具有深刻的实际意义)。



另见我过去的答案:反汇编代码中有符号和无符号的区别是什么?



浮点数数字表示在IEEE 754标准中定义:

http://en.wikipedia.org/wiki/Floating_point

http://en.wikipedia.org/wiki/IEEE_flo ating_point



现在,你可能会认为二进制文件有问题。有人认为,如果存在二进制文件的概念,那么应该存在一些其他概念,例如文本文件,而有些人甚至幻想过现有的十进制文件或十六进制文件。这几乎没有错。基本上,所有文件都是二进制。没有非二进制文件。当有人说二进制文件时,它不是用作任何强烈定义的概念,而是用于表示不使用数字的字符串表示的想法。通常,二进制文件意味着仅使用文本编辑器的人不能很好地读取。这些文件是由除了习惯文本编辑器之外的一些程序编写和读取的。



换句话说,二进制文件中的十六进制数字是荒谬的。如果文件被确定为二进制,你应该逐位编写数字,而不是将它们转换为字符/字符串。



让我们考虑一个例。如果使用32位 int 类型,则此类型的所有对象都占用32位。取一些数字,例如1234567890.在内存中,它将填充以下位:01001001100101100000001011010010

(不要将其与字符串01001001100101100000001011010010混淆,将其理解为内存中的位,最低有效位在右侧,左侧最重要的位)。它的十六进制表示将是499602d2。如果你认为这个演示是sting,那么字节等于50('2')然后是100('d')或64('D')再然后是50,依此类推。请注意,字符串表示数字所需的字符数可能取决于值,而不仅仅取决于类型:'d'取三个字符而'2'取两个字符。每个字节使用两个十六进制数字,每个字符1个字节的编码占用2倍的空间。换句话说,数字的字符串表示需要相当多的内存。



很久以前,ASCII标准定义了字符的表示,现在映射到部分Unicode: http:// en.wikipedia.org/wiki/ASCII



请注意,文本文件可以使用UTF-8编码的ASCII,每个字节使用2个字节表示十六进制数字,每个十进制数最多3个字节,但使用UTF-16(.NET使用的内存中所有字符的内部表示形式为UTF-16LE),所有数字字符每个字符占用2个字节,这使得每个字节最多使用6个字节,表示为字符串。



参见:

http://en.wikipedia.org/wiki/Unicode

http://www.unicode.org

https://msdn.microsoft.com/en-us/library/9b1s4yhz%28v=vs.90%29.aspx



要搜索文件中的数字,您需要知道数字的大小。您需要按字节序列转换数字,然后在文件中搜索此序列。您需要了解这些字节不是十进制或十六进制数字代码点;它们是数字的二进制表示的实际字节。您可以使用类 System.BitConverter 执行此序列化:

https://msdn.microsoft.com/en-us/library/system.bitconverter%28v=vs.110%29.aspx [ ^ ]。



另请参阅我过去的二进制I / O数据: vb.net二进制文件处理



-SA
Yes, you are very much confused with the basics. No need to apologize — it happened to many people in the past, but this is a really big misconception; without dismissing it, you cannot seriously go forward with computing.

First of all, the numbers cannot be hexadecimal or decimal. This is a property of strings representing numbers and not the numbers themselves. The numbers are all "binary". More exactly, most of technology using numbers is kept agnostic to exact computer representation of numbers; and this is really good, because it makes software more portable. Each time you assume particular representation of numbers, you compromise portability.

Nevertheless, there are many cases when you need to assume particular binary representation of numbers. This is how integer values are represented (on almost all systems): http://en.wikipedia.org/wiki/Two%27s_complement (not as simple as you thought, probably, but it makes deep practical sense).

See also my past answer: what is different between signed and unsigned in disassembly code ?.

And floating-point numbers representation is defined in the IEEE 754 standard:
http://en.wikipedia.org/wiki/Floating_point,
http://en.wikipedia.org/wiki/IEEE_floating_point.

Now, you probably tend to think wrong of binary files. Some people think that, if the concept of "binary file" exist, some other concepts should exist, such as "text files", and some people I new even fantasized about existing of "decimal files" or "hexadecimal files". It hardly could be more wrong than that. Essentially, all files are "binary". There are no "non-binary" files. When some say "binary file", it is used not as any strongly defined notion, but to conduct the idea that string representation of numbers is not used. Typically, "binary file" means "not very well readable by a human using just a text editor". Such files are written and read by some programs other than customary text editors.

In other words, "hexadecimal numbers in binary file" is absurd. If the file is decided to be "binary", you are supposed to write numbers bit-by-bit, not converting them to characters/strings.

Let's consider one example. If you use 32-bit int type, all object of this type occupy exactly 32 bit. Take some number, for example 1234567890. In memory, it will fill the following bits: 01001001100101100000001011010010
(don't mix it up with string "01001001100101100000001011010010", understand it as bits in memory, least significant bit on the right, most significant bit on left). Its hexadecimal presentation will be "499602d2". If you consider this presentation as sting, it will be the bytes equals to 50 ('2') then 100('d') or 64 ('D') then 50 again, and so on. Note that the number of characters required for string representation of number may depend on the value, not only on type: 'd' takes three characters and '2' takes two. Exactly two hexadecimal digits per bytes are used, which takes 2 times more room in 1-byte-per-character encodings. In other words, string representation of numbers takes considerably more memory.

Representation of characters have been defined long time ago by ASCII standard, which is now mapped to a part of Unicode: http://en.wikipedia.org/wiki/ASCII.

Note that text files can use ASCII of UTF-8 encodings using 2 bytes per byte representing a hexadecimal digit, and up to 3 bytes per decimal digit, but it UTF-16 is used (internal representation of all characters in memory used by .NET is UTF-16LE), all that digits characters occupy 2 bytes per character, which makes it to use up to 6 bytes per byte represented as string.

See also:
http://en.wikipedia.org/wiki/Unicode,
http://www.unicode.org,
https://msdn.microsoft.com/en-us/library/9b1s4yhz%28v=vs.90%29.aspx.

To search the number in a file, you need to know the size of the number. You need to convert number in a sequence of bytes and then search for this sequence in the file. You need to understand that those bytes are not the code points of decimal or hexadecimal digits; they are actual bytes of the binary representation of the number. You can perform this serialization using the class System.BitConverter:
https://msdn.microsoft.com/en-us/library/system.bitconverter%28v=vs.110%29.aspx[^].

See also my past numbers on binary I/O: vb.net binary file handling.

—SA


这篇关于如何读取二进制文件中的特定十六进制值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆