fgets()和嵌入的空字符 [英] fgets() and embedded null characters

查看:47
本文介绍了fgets()和嵌入的空字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的每个基于fgets()的程序经常会遇到包含嵌入空值的输入文件。 fgets很高兴

读取这些,但嵌入的空值随后导致问题

程序中的其他地方。由于fgets()没有返回

一旦它们在缓冲区中,那么读取它的字符数非常难以处理

嵌入的空值。


所以两个问题:


1.为什么编写fgets()的人有一个成功的

读取返回一个指针到存储缓冲区(

调用例程在任何情况下都已知道)而不是

读取的字符数(通常无法在
$ b确定)事后如果输入中有嵌入的空值,$ b全部吗?


2.有人可以提供指向函数的指针

用ANSI编写C表示:


A)从流中读取(如fgets)

B)存储到预分配的缓冲区(如fgets)

C)接受缓冲区的大小(如fgets)

D)返回读取的字符数(与fgets不同)

E)设置读取状态,理想情况下为一个整数组合

状态位多或le这些是:

1 EOF

2 LINETOOBIG(而不是检查最后一个字节)

4 READERROR(任何其他类型的READ错误)

(读取状态= 1,非零返回长度将

不是错误,它只是表示所有输入数据

有已经被消耗了。)


如果需要,我可以从fgetc推出自己的产品,但我宁愿不再重新发明这个轮子。

这个轮子。


谢谢,


David Mathog
ma *** *@caltech.edu

解决方案




David Mathog写道:

每隔一段时间我的基于fgets()的程序就会遇到包含嵌入空值的输入文件。 fgets很高兴看到这些,但嵌入的空值随后在程序的其他地方引起问题。由于fgets()没有返回
读取的字符数很难处理嵌入的空值一旦进入缓冲区。


另外,包含''\ 0''字符的文件不适合用文本流阅读的
。第7.19.2节

第2段描述了预期形式。一个文本流:

打印字符和一小组控制字符,

加上一些其他约定。如果你写一个''\ 0''到

文本流,它不能保证你可以读回来,

即使你使用getc也没有()。


如果数据可以包含''\0''(更一般地,如果它是

不符合预期的约定文字),你可以使用二进制流来
。但是人们必须质疑使用fgets()的智慧

,这是专门为以单位为单位的文本

输入而设计的。标准并没有禁止使用带有二进制流的
fgets(),但是fread()可能会更好。

所以有两个问题:

读取的字符数(如果输入中存在嵌入的空值,通常无法在
之后确定)?


"这只是其中之一,

只是其中一个疯狂的东西,

一个奇怪的设计要提高用字符串地狱,

只是其中之一。


....还有很多其他的库函数示例

回复你已经知道的东西而不是告诉你

有用的东西。发明fgets()(以及获取(),

和strcat()以及......)的人缺乏我们的二十二个后见之明。

2.有人可以吗请提供一个指向用ANSI C编写的函数的指针:

A)从流中读取(如fgets)
B)存储到预分配的缓冲区(如fgets)
C)接受缓冲区的大小(如fgets)
D)返回读取的字符数(与fgets不同)
E)设置读取状态,理想情况下以整数组合
状态位或多或少像这样:
1 EOF
2 LINETOOBIG(而不必检查最后一个字节)
4 READERROR(任何其他类型的READ错误)
(读取状态= 1,非零返回长度将不是错误,它只表示所有输入数据已被消耗。)

如果需要我可以滚动我的来自fgetc,但我宁愿不重新发明这个轮子。




我不知道具有相当规格的功能,

虽然有人可能写了一个(似乎每个人都是

最终给自己写了一个fgets()替换)。如果你打算自己滚动,我建议使用getc()代替fgetc()。

另外,虽然条件AD似乎完全合理,但是E

似乎比它需要的更多:似乎

大多数调用都需要伴随着一堆比特测试,

增加笨拙的界面。请注意,

feof()和ferror()函数已经可以区分案例

E1和E4;是否真的值得单独召唤E2?

缺少动态分配你需要*一些*区分方式

之间的行太长和适合但结束的行

EOF而不是换行符,但也许一个关于

使用或不使用缓冲区中最后一个位置的简单约定可以处理它

,界面更纤薄。


-
Er*********@sun.com


David Mathog写道:


每隔一段基于fgets()的程序就会遇到
包含嵌入空值的输入文件。 fgets很高兴看到这些,但嵌入的空值随后在程序的其他地方引起问题。由于fgets()没有返回
读取的字符数很难处理嵌入的空值一旦进入缓冲区。

所以有两个问题:

1.为什么编写fgets()的人有一个成功的读取返回一个指向存储缓冲区的指针(在任何情况下调用例程已经知道)
读取的字符数(如果输入中有嵌入的空值,通常无法在
之后确定)?


因为有人在大约30年前写过这种方式,并且改变

会打破各种现有代码。

2.有人可以提供指向用ANSI C编写的函数的指针:

A)从流中读取(如fgets)
B)存储到预分配的缓冲区(比如fgets)
C)接受缓冲区的大小(如fgets)
D)返回读取的字符数(与fgets不同)
E)设置读取状态,理想情况下为整数结合状态位或多或少像这样:
1 EOF
2 LINETOOBIG(而不是检查最后一个字节)
4 READERROR(任何其他类型的READ错误)
(读取状态= 1,非零返回长度将不是错误,它只表示所有输入数据已被消耗。)

如果需要我可以从fgetc推出自己的,但我宁愿不重新发明这个轮子。




宾果。除了你建议使用getc而不是

fgetc。顺便说一句,如果你的文件中有'\\'0''(nul,not null)字符,

它们不是文本文件,你将需要面对不可移植的

对行结尾的处理。


-

"如果你想通过groups.google.com发布后续内容,请不要使用

破损的回复链接在文章的底部。点击

" show options"在文章的顶部,然后点击

回复在文章标题的底部。 - Keith Thompson


Eric Sosman写道:

另外,包含''\ 0''字符的文件不是
适合阅读文本流。第7.19.2节
第2段描述了预期形式。文本流:
打印字符和一小组控制字符,
以及其他一些约定。如果你在
文本流中写''\ 0'',那么即使你使用getc()也不能保证你能读回来,




当然。不幸的是,在现实世界中,我有时会遇到包含嵌入空字符的

文件但是

否则是普通文本文件。


到目前为止,两个回复都说使用getc代替fgetc,

是速度吗?


这是这个函数的第一次传递。在每个人都在名字上跳过

之前请注意super_fgets()

并不意味着它比fgets()更好,只是它做得更多。

并且我还没有彻底测试过它。


理想情况下它会读取比(f)getc更低的水平以便

EOF与读取错误的二次测试不是必需的。

它有两个警告字段:SFG_CRLF,表示存在

a CRLF(vs 。一个LF)和SFG_EMBEDDED_NULL。它没有纠正这些,

只是警告它们存在。对于尾随\r的测试几乎是免费的,但是对于嵌入式NULL的测试会使事情变慢一些。

但是我认为比测试嵌入式null更少字符
调用此例程后
,因为字符已经在CPU寄存器中加载了


/ * super_fgets()状态位,放入标题fiole * /

#define SFG_EOF 1 / *输入以文件结尾终止* /

#define SFG_EOL 2 / *输入以行尾结束(\ n)* /

#define由CRLF终止的SFG_CRLF 4 / *输入(\\\\ n)\ r仍然存在! * /

#define SFG_EMBEDDED_NULL 8 / *嵌入的NULL字符存在* /

#define SFG_BUFFER_OVERFLOW 16 / *输入缓冲区已满* /

#define SFG_READERROR 32 / *不可恢复的读取错误* /


/ * super_fgets在getc级别实现。它执行以下操作:

A:从流中读取(如fgets)

B:接受预分配缓冲区(如fgets)

C :接受预分配缓冲区的大小(如fgets)

D:在所有情况下终止用''\ 0''读取的字符

(与fgets不同)一个不适合缓冲区的读数。

输入由EOL(\ n)或EOF终止。

E:设置终止的位置null =

读取的字符数(size_t)

D:设置一个状态整数,其中的位数为

远远超过上表(SFG_ *)

限制:fgets()替换不下降!


* /


无效super_fgets(char * string,size_t size,FILE * stream,

size_t * cterm,unsigned int * status){


size_t icterm; / *内部cterm值* /

unsigned int istatus; / *内部状态值* /

size_t lastslot; / *缓冲区中的最后一个字符单元格* /

int readthis; / *读过的字符* /


icterm = 0;

istatus = 0;

lastslot = size-1 ;


while(1){


if(icterm == lastslot){

istatus | = SFG_BUFFER_OVERFLOW ;

休息;

}


readthis = fgetc(stream);


如果(readthis == EOF){

/ *文件末尾或

读取错误,请找出* /

if(feof(stream)){istatus | = SFG_EOF; }

else {istatus | = SFG_READERROR; }

休息;

}


if(readthis ==''\ n''){

/ * LF是行终止符,返回到目前为止已经读过的内容,

注意,\ n不会被返回!!!在\\\\ n终止输入

文件尾随\r可能存在,检查和

也表示信号。 * /


istatus | = SFG_EOL;

if((icterm> 0)&&(string [icterm-1] ==''\\ \\ r''))istatus | = SFG_CRLF;

休息;

}


/ *警告嵌入的空字符* /

if(readthis ==''\''')istatus | = SFG_EMBEDDED_NULL;


string [icterm] = readthis;

icterm ++;


}

string [icterm] =''\ 0'';

* status = istatus;

* cterm = icterm;

返回;


}

问候,


David Mathog
ma****@caltech.edu


Every so often one of my fgets() based programs encounters
an input file containing embedded nulls. fgets is happy to
read these but the embedded nulls subsequently cause problems
elsewhere in the program. Since fgets() doesn''t return
the number of characters read it is pretty tough to handle
the embedded nulls once they are in the buffer.

So two questions:

1. Why did the folks who wrote fgets() have a successful
read return a pointer to the storage buffer (which the
calling routine already knew in any case) instead of the
number of characters read (which often cannot determine at
all after the fact if there are embedded nulls in the input)?

2. Can somebody please supply a pointer to a function
written in ANSI C that:

A) reads from a stream (like fgets)
B) stores to a preallocated buffer (like fgets)
C) accepts the size of the buffer (like fgets)
D) returns the number of characters read (unlike fgets)
E) sets read status, ideally in an integer combining
status bits more or less like these:
1 EOF
2 LINETOOBIG (instead of having to check the last byte)
4 READERROR (any other kind of READ error)
(read status = 1 with a nonzero returned length would
not be an error, it just indicates that all input data
has been consumed.)

If need be I can roll my own from fgetc, but I''d rather not reinvent
this wheel.

Thanks,

David Mathog
ma****@caltech.edu

解决方案



David Mathog wrote:

Every so often one of my fgets() based programs encounters
an input file containing embedded nulls. fgets is happy to
read these but the embedded nulls subsequently cause problems
elsewhere in the program. Since fgets() doesn''t return
the number of characters read it is pretty tough to handle
the embedded nulls once they are in the buffer.
As an aside, a file containing ''\0'' characters is not
suitable for reading with a text stream. Section 7.19.2
paragraph 2 describes the "expected form" of a text stream:
printing characters and a small group of control characters,
plus a few other conventions. If you write a ''\0'' to a
text stream it''s not guaranteed that you can read it back,
not even if you use getc().

If the data can include ''\0'' (more generally, if it
doesn''t follow the expected conventions for text), you can
use a binary stream. But then one must question the wisdom
of using fgets(), which is specifically designed for textual
input in units of lines. The Standard doesn''t prohibit using
fgets() with a binary stream, but fread() might be better.
So two questions:

1. Why did the folks who wrote fgets() have a successful
read return a pointer to the storage buffer (which the
calling routine already knew in any case) instead of the
number of characters read (which often cannot determine at
all after the fact if there are embedded nulls in the input)?
"It was just one of those things,
Just one of those crazy flings,
One weird design to raise Hell with strings,
Just one of those things."

.... and there are plenty of other examples of library functions
that echo back what you already know instead of telling you
something useful. The folks who invented fgets() (and gets(),
and strcat(), and ...) lacked our twenty-twenty hindsight.
2. Can somebody please supply a pointer to a function
written in ANSI C that:

A) reads from a stream (like fgets)
B) stores to a preallocated buffer (like fgets)
C) accepts the size of the buffer (like fgets)
D) returns the number of characters read (unlike fgets)
E) sets read status, ideally in an integer combining
status bits more or less like these:
1 EOF
2 LINETOOBIG (instead of having to check the last byte)
4 READERROR (any other kind of READ error)
(read status = 1 with a nonzero returned length would
not be an error, it just indicates that all input data
has been consumed.)

If need be I can roll my own from fgetc, but I''d rather not reinvent
this wheel.



I don''t know of a function with quite this specification,
although somebody may have written one (it seems everybody
eventually writes himself an fgets() replacement). If you
wind up rolling your own, I''d suggest getc() instead of fgetc().
Also, while conditions A-D seem entirely reasonable, point E
seems more involved than it needs to be: it would seem that
most calls would need to be accompanied by a bunch of bit-testing,
increasing the "clunkiness" of the interface. Note that the
feof() and ferror() functions can already discriminate cases
E1 and E4; is it really worth while to call out E2 separately?
Absent dynamic allocation you need *some* way of discriminating
between "line too long" and "line that just fits but ends with
EOF instead of newline," but perhaps a simple convention about
using or not using the last spot in the buffer might handle it
with a slimmer interface.

--
Er*********@sun.com


David Mathog wrote:


Every so often one of my fgets() based programs encounters
an input file containing embedded nulls. fgets is happy to
read these but the embedded nulls subsequently cause problems
elsewhere in the program. Since fgets() doesn''t return
the number of characters read it is pretty tough to handle
the embedded nulls once they are in the buffer.

So two questions:

1. Why did the folks who wrote fgets() have a successful
read return a pointer to the storage buffer (which the
calling routine already knew in any case) instead of the
number of characters read (which often cannot determine at
all after the fact if there are embedded nulls in the input)?
Because somebody wrote it that way about 30 years ago, and a change
would break all sorts of existing code.

2. Can somebody please supply a pointer to a function
written in ANSI C that:

A) reads from a stream (like fgets)
B) stores to a preallocated buffer (like fgets)
C) accepts the size of the buffer (like fgets)
D) returns the number of characters read (unlike fgets)
E) sets read status, ideally in an integer combining
status bits more or less like these:
1 EOF
2 LINETOOBIG (instead of having to check the last byte)
4 READERROR (any other kind of READ error)
(read status = 1 with a nonzero returned length would
not be an error, it just indicates that all input data
has been consumed.)

If need be I can roll my own from fgetc, but I''d rather not
reinvent this wheel.



Bingo. Except you would be well advised to use getc rather than
fgetc. BTW, if your files have ''\0'' (nul, not null) chars in them,
they are not textfiles, and you will need to face the non-portable
treatment of line endings.

--
"If you want to post a followup via groups.google.com, don''t use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson


Eric Sosman wrote:

As an aside, a file containing ''\0'' characters is not
suitable for reading with a text stream. Section 7.19.2
paragraph 2 describes the "expected form" of a text stream:
printing characters and a small group of control characters,
plus a few other conventions. If you write a ''\0'' to a
text stream it''s not guaranteed that you can read it back,
not even if you use getc().



Sure. Unfortunately in the real world I sometimes encounter
files that do contain embedded null characters but are
otherwise normal text files.

Both responses so far said to use getc instead of fgetc,
is that for speed?

Here''s a first pass at this function. Before everybody jumps
on the name please note that super_fgets()
doesn''t imply that it is better than fgets(), just that it does more.
And no I have not tested it very thoroughly yet.

Ideally it would read at an even lower level than (f)getc so that the
secondary tests for EOF vs. read error wouldn''t be necessary.
It has two warning fields: SFG_CRLF, indicating the presence of
a CRLF (vs. a LF) and SFG_EMBEDDED_NULL. It does not correct these,
just warns that they exist. The test for the trailing \r is nearly
free but the test for embedded NULL will slow things down a bit.
However less I think than testing for the embedded null characters
after this routine is called, since the character will already be
loaded in a CPU register.
/* super_fgets() status bits, put in a header fiole */
#define SFG_EOF 1 /* input terminated by End of File */
#define SFG_EOL 2 /* input terminated by End of line (\n) */
#define SFG_CRLF 4 /* input terminated by CRLF (\r\n) \r remains! */
#define SFG_EMBEDDED_NULL 8 /* embedded NULL characters are present */
#define SFG_BUFFER_OVERFLOW 16 /* input buffer full */
#define SFG_READERROR 32 /* unrecoverable read error */

/* super_fgets is implemented at the getc level. It does the following:
A: reads from a stream (like fgets)
B: accepts a preallocated buffer (like fgets)
C: accepts the size of that preallocated buffer (like fgets)
D: terminates the characters read with a ''\0'' in all cases
(unlike fgets on a read that won''t fit into the buffer)
Input is terminated by either EOL (\n) or EOF.
E: sets the position of the terminating null =
number of characters read (size_t)
D: sets a status integer where the bits are as
defined in the table far above (SFG_*)
Limitations: not a drop in fgets() replacement!

*/

void super_fgets(char *string, size_t size, FILE *stream,
size_t *cterm, unsigned int *status){

size_t icterm; /* internal cterm value */
unsigned int istatus; /* internal status value */
size_t lastslot; /* the last character cell in the buffer */
int readthis; /* the character which was read */

icterm = 0;
istatus = 0;
lastslot = size-1;

while(1){

if(icterm == lastslot){
istatus |= SFG_BUFFER_OVERFLOW;
break;
}

readthis=fgetc(stream);

if(readthis == EOF){
/* either the end of the file or a
read error, figure out which */
if(feof(stream)){ istatus |= SFG_EOF; }
else { istatus |= SFG_READERROR; }
break;
}

if(readthis == ''\n''){
/* LF is a line terminator, return what has been read so far,
NOTE, the \n is NOT returned!!! On \r\n terminated input
files the trailing \r may be present, check and
signal that too. */

istatus |= SFG_EOL;
if( (icterm>0) && (string[icterm-1]==''\r'')) istatus |= SFG_CRLF;
break;
}

/* warn about embedded null characters */
if(readthis == ''\0'')istatus |= SFG_EMBEDDED_NULL;

string[icterm] = readthis;
icterm++;

}
string[icterm]=''\0'';
*status = istatus;
*cterm = icterm;
return;

}
Regards,

David Mathog
ma****@caltech.edu


这篇关于fgets()和嵌入的空字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆