在二进制文件中搜索字节字符串。 [英] Searching for byte string in a binary file.

查看:55
本文介绍了在二进制文件中搜索字节字符串。的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我前一段时间参加了C课程,但我现在才开始使用它,

用于个人宠物项目。我目前的绊脚石是找到一种有效的方法来查找计数

字符串的开头和二进制文件中的数据之间的匹配。鉴于......


#include< stdio.h>


int main(int argc,char * argv [])

{

char bstring [255];

int bstring_length,match_length;

long match_location;

FILE * input_file,* output_file;


if(argc!= 3){

printf("正确用法:\ n%s input_filename output_filename \ n",argv [0]);

返回1;

}

if((input_file = fopen(argv [ 1],rb))== NULL){

printf(打开%s表示输入\ n,argv [1]);

返回1;

}

if((output_file = fopen(argv [2]," wb"))== NULL){

printf("错误打开%s表示输出\ n",argv [2]);

返回1;

}

...


稍后,bstring和bstring_length被初始化。 bstring可以

包含\ 0',所以逻辑长度包含(不超过255)是实际存储在bstring_length中的
。我要做的是找到bstring左侧最长匹配的

位置和长度。

如果不清楚,这是一个例子......


bstring [0] =''a'';

bstring [1] =''\ 0 '';

bstring [2] =''b'';

bstring [3] =''c'';

bstring [4] =''d'';

bstring_length = 5;


假设序列在文件中的任何位置都不匹配。但是,

假设文件中存在序列''a'',''\'',''b'','c''。

匹配开头返回match_length(4)和

ftell()的最有效搜索是什么?如果它是一个函数,我应该在main()中声明
吗?


int bstring_length,* match_length;

long * match_location;


我假设函数原型类似于...


int findmatch(bstring [255], bstring_length,match_length,match_location);


通过传递指针,我假设match_location和

match_length都将被修改并可用于main()。或者是我对缺乏指针的理解。如果找到

匹配,则findmatch()应返回0,如果不匹配,则返回1。内存映射文件会在这里有用吗?


-

Walter Dnes;我的电子邮件地址是* ALMOST *,如 wz*******@waltdnes.org

删除z得到我的真实地址。如果被阻止,请按照550消息末尾的说明进行操作。

I took a C course some time ago, but I''m only now beginning to use it,
for a personal pet project. My current stumbling-block is finding an
efficient way to find a match between the beginning of a "counted"
string and data in a binary file. Given...

#include <stdio.h>

int main(int argc, char *argv[])
{
char bstring[255];
int bstring_length, match_length;
long match_location;
FILE *input_file, *output_file;

if(argc != 3){
printf(" Correct usage:\n %s input_filename output_filename\n", argv[0]);
return 1;
}
if((input_file = fopen(argv[1], "rb")) == NULL){
printf("Error opening %s for input\n", argv[1]);
return 1;
}
if((output_file = fopen(argv[2], "wb")) == NULL){
printf("Error opening %s for output\n", argv[2]);
return 1;
}
...

Later on, bstring and bstring_length get initialized. bstring may
contain \0''s, so the "logical length" (which will not exceed 255) is
actually stored in bstring_length. What I''m trying to do is to find the
location and length of the longest match from the left side of bstring.
If that''s not clear, here''s an example...

bstring[0] = ''a'';
bstring[1] = ''\0'';
bstring[2] = ''b'';
bstring[3] = ''c'';
bstring[4] = ''d'';
bstring_length = 5;

Assume that the sequence is not matched anywhere in the file. However,
assume that the sequence ''a'', ''\0'', ''b'', ''c'' does exist in the file.
What''s the most efficient search that returns match_length ( 4 ) and
ftell() of the beginning of the match? If it''s a function, should I
declare in main() like so?

int bstring_length, *match_length;
long *match_location;

I assume that the function prototype would be something like...

int findmatch( bstring[255], bstring_length, match_length, match_location);

By passing pointers, I assume that both match_location and
match_length will be modified and available to main(). Or is my
understanding of pointers lacking. findmatch() should return 0 if a
match is found, and 1 if no match. Would memory-mapped files help here?

--
Walter Dnes; my email address is *ALMOST* like wz*******@waltdnes.org
Delete the "z" to get my real address. If that gets blocked, follow
the instructions at the end of the 550 message.

推荐答案

" Walter Dnes(删除''z'以获取我的真实地址) < WZ ******* @ waltdnes.org>写道:
"Walter Dnes (delete the ''z'' to get my real address)" <wz*******@waltdnes.org> wrote:
我前一段时间参加了C课程,但我现在才开始使用它,
用于个人宠物项目。我目前的绊脚石是找到一种有效的方法来找到计数字符串的开头和二进制文件中的数据之间的匹配。给定...
#include< stdio.h>


还包括< stdlib.h> ;,见下文。

int main(int argc,char * argv [])
{
char bstring [255];
int bstring_length,match_length;
long match_location;
FILE * input_file,* output_file;
if(argc!= 3){
printf("正确用法:\ n%s input_filename output_filename \ n",argv [0]);
返回1;


制作


返回EXIT_FAILURE;


并非所有操作系统都返回1表示失败,0为了成功,并且使用宏&/ $
EXIT_FAILURE和EXIT_SUCCESS,在< stdlib.h>中定义,你总是在

安全方面。


另一个常见的惯例是将错误消息打印到stderr而不是

将它们与程序的正常输出混合。所以最好使用


fprintf(stderr,正确用法:\ n%s input_filename output_filename \ n,

argv [0]);

}
if((input_file = fopen(argv [1]," rb"))== NULL){
printf("错误打开输入%s) \ n",argv [1]);
返回1;
}
if((output_file = fopen(argv [2]," wb"))== NULL){
printf(打开%s表示输出\ n,argv [2]);
返回1;
}
...
稍后,bstring和bstring_length得到初始化。 bstring可能包含\ 0',因此逻辑长度可以包含\ 0'。 (不超过255)实际上存储在bstring_length中。我想要做的是从bstring的左侧找到最长匹配的位置和长度。
如果那不清楚,这里是一个例子.. 。
bstring [0] =''a'';
bstring [1] =''\ 0'';
bstring [2] =''b'';
bstring [3] =''c'';
bstring [4] =''d'';
bstring_length = 5;
假设序列在文件中的任何位置都不匹配。但是,
假设文件中存在序列''a'',''\'',''b'','c''。
什么是最有效的搜索返回匹配开头的match_length(4)和
ftell()?如果它是一个函数,我应该在main()中声明如此吗?
int bstring_length,* match_length;
long * match_location;
我假设函数原型类似于......
int findmatch(bstring [255],bstring_length,match_length,match_location);


对不起,这不是原型。

通过传递指针,我认为match_location和
match_length都将被修改并且可用到main()。或者是我对缺乏指针的理解。如果找到
匹配,则findmatch()应返回0,如果不匹配则返回1。


如果你想将find_match()中的''match_location''设置为在main()中可见的长

值,你需要稍微改变一下。你

必须在main()中定义''match_location'',即有那个


long match_location;


然后将指向该变量的指针传递给函数。所以你的函数原型的
原型看起来像


int find_match(char buf [255],int blen,int mlen,long * loc);


(我故意更改了在main()中定义的函数中使用的名称

,以使其更明显是什么

属于main(),属于find_match()。)如果你没有
打算更改函数中的char数组,那可能是

最好先制作第一个参数const char buf [255]。


你现在可以称之为


find_match(bstring,bstring_length,match_len,& match_location);


(请注意''match_location''前面的''&'',它告诉

你传递''match_location''的地址而不是它的值。

在find_match()中你会分配比赛的位置

到''* loc'',即你把它写进了记忆位置''loc''点

to。


顺便说一句,使用像''bstring''这样的名字'bstring_length''有点

误导,因为''bstring''isn''ta字符串(一个真正的字符串不能有

嵌入''\0 ''字符),它只是一系列字符。所以它可能会谨慎地给它起一个名字,这个名字并不能让人们认为

它将成为一个字符串。

内存映射文件在这里有帮助吗?
I took a C course some time ago, but I''m only now beginning to use it,
for a personal pet project. My current stumbling-block is finding an
efficient way to find a match between the beginning of a "counted"
string and data in a binary file. Given... #include <stdio.h>
Also include <stdlib.h>, see below.
int main(int argc, char *argv[])
{
char bstring[255];
int bstring_length, match_length;
long match_location;
FILE *input_file, *output_file; if(argc != 3){
printf(" Correct usage:\n %s input_filename output_filename\n", argv[0]);
return 1;
Make that

return EXIT_FAILURE;

not all OSes return 1 for failure and 0 for success, and with the macros
EXIT_FAILURE and EXIT_SUCCESS, defined in <stdlib.h>, you''re always on
the safe side.

Another common convention is to print error messages to stderr and not
mix them with the normal output of your program. So better use

fprintf( stderr, "Correct usage:\n %s input_filename output_filename\n",
argv[0]);
}
if((input_file = fopen(argv[1], "rb")) == NULL){
printf("Error opening %s for input\n", argv[1]);
return 1;
}
if((output_file = fopen(argv[2], "wb")) == NULL){
printf("Error opening %s for output\n", argv[2]);
return 1;
}
... Later on, bstring and bstring_length get initialized. bstring may
contain \0''s, so the "logical length" (which will not exceed 255) is
actually stored in bstring_length. What I''m trying to do is to find the
location and length of the longest match from the left side of bstring.
If that''s not clear, here''s an example... bstring[0] = ''a'';
bstring[1] = ''\0'';
bstring[2] = ''b'';
bstring[3] = ''c'';
bstring[4] = ''d'';
bstring_length = 5; Assume that the sequence is not matched anywhere in the file. However,
assume that the sequence ''a'', ''\0'', ''b'', ''c'' does exist in the file.
What''s the most efficient search that returns match_length ( 4 ) and
ftell() of the beginning of the match? If it''s a function, should I
declare in main() like so? int bstring_length, *match_length;
long *match_location; I assume that the function prototype would be something like... int findmatch( bstring[255], bstring_length, match_length, match_location);
Sorry, that''s not a prototype.
By passing pointers, I assume that both match_location and
match_length will be modified and available to main(). Or is my
understanding of pointers lacking. findmatch() should return 0 if a
match is found, and 1 if no match.
When you want to set ''match_location'' within find_match() to a long
value that''s visible in main() you need to change that a bit. You
must define ''match_location'' in main() as long, i.e. have there

long match_location;

and then pass a pointer to that variable to the function. So your
prototype for the function would look like

int find_match( char buf[ 255 ], int blen, int mlen, long *loc );

(I intentionally changed the names to be used within the function
from the ones you defined in main() to make it more obvious what
belongs to main() and what belongs to find_match().) If you don''t
intend to change the char array within the function it probably
would be better to make the first argument "const char buf[ 255 ]".

You now would call that function like

find_match( bstring, bstring_length, match_len, &match_location );

(please note the ''&'' in front of ''match_location'', it tells that
you pass the address of ''match_location'' and not its value).
Within find_match() you would then assign the position of the match
to ''*loc'', i.e. you write it into the memory location ''loc'' points
to.

By the way, using names like ''bstring'' and ''bstring_length'' is a bit
misleading since ''bstring'' isn''t a string (a real string can''t have
embedded ''\0'' characters), it''s just an array of chars. So it might
be prudent to give it a name that doesn''t make people assume that
it''s going to be a string.
Would memory-mapped files help here?




关于如何以最快的速度进行匹配的其余问题,

最有效的方法不是真正的C问题,而是更多关于什么

种类的算法。那些在像

comp.programming这样的小组中讨论得更好,因为它真的不是关于C.而且google

搜索例如Boyer-Moore算法 (仅举一例)

可能会提出很多有趣的链接(并告诉你

为这个问题开发了多少可能的算法
多年来
。例如

http://www-igm.univ-mlv.fr/~lecroq/string/

看起来很有趣)。最后,关于内存映射

文件的问题在这里是偏离主题的,因为这只能通过系统 -

特定的C扩展来完成。

问候,Jens

-

\ Jens Thoms Toerring ___ Je *********** @ physik.fu-berlin.de

\ __________________________ http://www.toerring.de



The rest of your questions on how to do the matching in the fastest,
most effective way are not really C questions but more about what
kind of algorithm to use. That''s better discussed in groups like
comp.programming since it really doesn''t is is about C. And a google
search for e.g. "Boyer-Moore algorithm" (just to name one) will
probably come up with a lot of interesting links (and show you
how many possible algorithms have been developed for that problem
over the years. E.g.

http://www-igm.univ-mlv.fr/~lecroq/string/

looks rather interesting). Finally, questions about memory-mapping
files are off-topic here because that can only be done with system-
specific extensions to C.
Regards, Jens
--
\ Jens Thoms Toerring ___ Je***********@physik.fu-berlin.de
\__________________________ http://www.toerring.de


2004年5月16日格林威治标准时间11:39:57, Je *********** @ physik .fu-berlin.de ,< Je *********** @ physik.fu-berlin.de>写道:
On 16 May 2004 11:39:57 GMT, Je***********@physik.fu-berlin.de, <Je***********@physik.fu-berlin.de> wrote:
例如 http://www-igm.univ-mlv.fr/ ~lecroq / string / 看起来很有趣)。


谢谢。这似乎正是我所寻找的。

使

返回EXIT_FAILURE;

并非所有操作系统都返回1表示失败,0表示成功,以及在< stdlib.h>中定义的宏EXOT_FAILURE和EXIT_SUCCESS,你总是在安全的一面。

另一个常见的惯例是将错误消息打印到stderr而不是将它们与程序的正常输出混合。所以更好的使用

fprintf(stderr,正确用法:\ n%s input_filename output_filename \ n,
argv [0]);
E.g. http://www-igm.univ-mlv.fr/~lecroq/string/ looks rather interesting).
Thanks. That appears to be exactly what I was looking for.
Make that

return EXIT_FAILURE;

not all OSes return 1 for failure and 0 for success, and with the macros
EXIT_FAILURE and EXIT_SUCCESS, defined in <stdlib.h>, you''re always on
the safe side.

Another common convention is to print error messages to stderr and not
mix them with the normal output of your program. So better use

fprintf( stderr, "Correct usage:\n %s input_filename output_filename\n",
argv[0]);




感谢您的提示。请注意,我已经改变了主题。另一个

问题;这是定义TRUE / FALSE的正确方法吗?


const int TRUE =(1 == 1);

const int FALSE =(1!= 1);


-

Walter Dnes;我的电子邮件地址是* ALMOST *,如 wz*******@waltdnes.org

删除z得到我的真实地址。如果被阻止,请按照550消息末尾的说明进行操作。



Thanks for the tips. Note that I''ve changed the subject. Another
question; is this the correct way to define TRUE/FALSE?

const int TRUE = (1==1);
const int FALSE = (1!=1);

--
Walter Dnes; my email address is *ALMOST* like wz*******@waltdnes.org
Delete the "z" to get my real address. If that gets blocked, follow
the instructions at the end of the 550 message.


" Walter Dnes(删除''z' '得到我的真实地址) < WZ ******* @ waltdnes.org>写道:
"Walter Dnes (delete the ''z'' to get my real address)" <wz*******@waltdnes.org> writes:
感谢您的提示。请注意,我已经改变了主题。另一个问题;这是定义TRUE / FALSE的正确方法吗?

const int TRUE =(1 == 1);
const int FALSE =(1!= 1);
Thanks for the tips. Note that I''ve changed the subject. Another
question; is this the correct way to define TRUE/FALSE?

const int TRUE = (1==1);
const int FALSE = (1!=1);




这是正确的,但表明缺乏理解。阅读

常见问题解答。除了关于这个问题的常见问题解答评论之外,还应该注意变量的值,即使是定义为

常量的变量,也不能用在常量表达式中。 br />

9.1:在C中使用布尔值的正确类型是什么?为什么

不是标准型?我应该使用#defines或枚举

的真值和假值吗?


答:C不提供标准布尔类型,部分原因是因为

选择一个涉及空间/时间权衡,最好是由程序员决定的
。 (使用int可能会更快,而使用char的
可能会节省数据空间。较小的类型可能会使生成的代码更大或更慢,但是,如果它们需要大量的

与int之间的转换。)




true / false值的#defines和枚举常量之间的选择是任意而不是非常有趣(参见

也有问题2.22和17.10)。使用任何一个


#define TRUE 1 #define YES 1

#define FALSE 0 #define NO 0


enum bool {false,true}; enum bool {no,yes};


或使用raw 1和0,只要你在一个

程序或项目中保持一致。 (如果你的

调试器在检查

变量时显示枚举常量的名称,则枚举可能更好。)


有些人喜欢像
这样的变体

#define TRUE(1 == 1)

#define FALSE(!TRUE)


或定义帮助者宏等如


#define Istrue(e)((e)!= 0)


这些不买任何东西(见下面的问题9.2;另见

问题5.12和10.2)。


9.2:因为任何非零值,所以#defining为1是危险的

被视为真实在C?如果内置逻辑或

关系运算符返回该怎么办?除了1之外的东西?


答:在C中,任何非零值都被认为是真的,这是真的(原文如此),但这只适用于on输入,即布尔值为预期的
。当内置的

运算符生成布尔值时,保证为1或0.因此,测试


if((a == b)== TRUE)


会按预期工作(只要TRUE为1),但它显然是愚蠢的。事实上,针对TRUE和

FALSE的显式测试通常是不合适的,因为某些库

函数(特别是isupper(),isalpha()等)返回,

成功,非零值,不一定是1.

(此外,如果你认为if((a == b)== TRUE)是

比if(a == b)改进了,为什么停在那里?为什么不

使用" if(((= = b)= = TRUE)== TRUE)"?)一个好的经验法则

是使用TRUE和FALSE(或类似)仅用于赋值给布尔变量
或函数参数,或作为布尔函数的返回值,但从不进行比较。


预处理器宏TRUE和FALSE(和当然,NULL)

用于代码可读性,而不是因为底层值

可能会改变。 (另见问题5.3和5.10。)


虽然使用像TRUE和FALSE(或者是

和NO)这样的宏似乎更清晰,布尔值和定义可以在C中充分混淆,一些程序员认为,
,TRUE和FALSE宏只会加剧混乱,并且

更喜欢使用raw改为1和0。 (另见问题5.9。)


参考文献:K& R1 Sec。 2.6 p。 39,Sec。 2.7 p。 41; K& R2 Sec。 2.6

p。 42,Sec。 2.7 p。 44,Sec。 A7.4.7 p。 204,Sec。 A7.9 p。 206; ISO

秒6.3.3.3,Sec。 6.3.8,Sec。 6.3.9,Sec。 6.3.13,Sec。 6.3.14,

秒6.3.15,Sec。 6.6.4.1,Sec。 6.6.5; H& S Sec。 7.5.4 pp.196-7,

秒。 7.6.4 pp.207-8,Sec。 7.6.5 pp.208-9,Sec。 7.7页.217-8,

秒。 7.8页,第218-9页,第二节8.5 pp.238-9,Sec。 8.6 pp.241-4;

乌龟对阿基里斯说的是什么。


-

int main( void){char p [] =" ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuv wxyz。\

\ n",* q =" kl BIcNBFr.NKEzjwCIxNJC" ;; int i = sizeof p / 2; char * strchr (); int putchar(\

); while(* q){i + = strchr(p,* q ++) - p; if(i> =(int)sizeof p)i- = sizeof p-1; putchar(p [i] \

);} return 0;}



It is correct, but indicates a lack of understanding. Read the
FAQ. In addition to the FAQ commentary on this issue, it should
be noted that the value of variables, even those defined as
constant, cannot be used in constant expressions.

9.1: What is the right type to use for Boolean values in C? Why
isn''t it a standard type? Should I use #defines or enums for
the true and false values?

A: C does not provide a standard Boolean type, in part because
picking one involves a space/time tradeoff which can best be
decided by the programmer. (Using an int may be faster, while
using char may save data space. Smaller types may make the
generated code bigger or slower, though, if they require lots of
conversions to and from int.)

The choice between #defines and enumeration constants for the
true/false values is arbitrary and not terribly interesting (see
also questions 2.22 and 17.10). Use any of

#define TRUE 1 #define YES 1
#define FALSE 0 #define NO 0

enum bool {false, true}; enum bool {no, yes};

or use raw 1 and 0, as long as you are consistent within one
program or project. (An enumeration may be preferable if your
debugger shows the names of enumeration constants when examining
variables.)

Some people prefer variants like

#define TRUE (1==1)
#define FALSE (!TRUE)

or define "helper" macros such as

#define Istrue(e) ((e) != 0)

These don''t buy anything (see question 9.2 below; see also
questions 5.12 and 10.2).

9.2: Isn''t #defining TRUE to be 1 dangerous, since any nonzero value
is considered "true" in C? What if a built-in logical or
relational operator "returns" something other than 1?

A: It is true (sic) that any nonzero value is considered true in C,
but this applies only "on input", i.e. where a Boolean value is
expected. When a Boolean value is generated by a built-in
operator, it is guaranteed to be 1 or 0. Therefore, the test

if((a == b) == TRUE)

would work as expected (as long as TRUE is 1), but it is
obviously silly. In fact, explicit tests against TRUE and
FALSE are generally inappropriate, because some library
functions (notably isupper(), isalpha(), etc.) return,
on success, a nonzero value which is not necessarily 1.
(Besides, if you believe that "if((a == b) == TRUE)" is
an improvement over "if(a == b)", why stop there? Why not
use "if(((a == b) == TRUE) == TRUE)"?) A good rule of thumb
is to use TRUE and FALSE (or the like) only for assignment
to a Boolean variable or function parameter, or as the return
value from a Boolean function, but never in a comparison.

The preprocessor macros TRUE and FALSE (and, of course, NULL)
are used for code readability, not because the underlying values
might ever change. (See also questions 5.3 and 5.10.)

Although the use of macros like TRUE and FALSE (or YES
and NO) seems clearer, Boolean values and definitions can
be sufficiently confusing in C that some programmers feel
that TRUE and FALSE macros only compound the confusion, and
prefer to use raw 1 and 0 instead. (See also question 5.9.)

References: K&R1 Sec. 2.6 p. 39, Sec. 2.7 p. 41; K&R2 Sec. 2.6
p. 42, Sec. 2.7 p. 44, Sec. A7.4.7 p. 204, Sec. A7.9 p. 206; ISO
Sec. 6.3.3.3, Sec. 6.3.8, Sec. 6.3.9, Sec. 6.3.13, Sec. 6.3.14,
Sec. 6.3.15, Sec. 6.6.4.1, Sec. 6.6.5; H&S Sec. 7.5.4 pp. 196-7,
Sec. 7.6.4 pp. 207-8, Sec. 7.6.5 pp. 208-9, Sec. 7.7 pp. 217-8,
Sec. 7.8 pp. 218-9, Sec. 8.5 pp. 238-9, Sec. 8.6 pp. 241-4;
"What the Tortoise Said to Achilles".

--
int main(void){char p[]="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuv wxyz.\
\n",*q="kl BIcNBFr.NKEzjwCIxNJC";int i=sizeof p/2;char *strchr();int putchar(\
);while(*q){i+=strchr(p,*q++)-p;if(i>=(int)sizeof p)i-=sizeof p-1;putchar(p[i]\
);}return 0;}


这篇关于在二进制文件中搜索字节字符串。的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆