如何确定文件是ASCII还是二进制? [英] how to determine a file is ASCII or binary?

查看:490
本文介绍了如何确定文件是ASCII还是二进制?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

全部


由于操作系统将ASCII和二进制文件视为一个字节序列,因此可以通过任何方式确定文件类型除了判断分机?


谢谢!

Hi, all

Since the OS look both ASCII and binary file as a sequence of bytes, is
there any way to determine the file type except to judge the extension?

Thank you!

推荐答案

< posted&邮寄>


没有ASCII在C.

" text"之间存在一些人为的区别。和二进制。 "文本"作为二进制文件的一个特例,操作系统可能会对数据执行某些操作,因为它被写入

磁盘,使其与运行的应用程序兼容文字。


由于没有定义魔法可能是什么,所以同样没有

方法来区分文本。来自二进制的文件文件。所有文本文件都是
二进制文件。识别文本文件的唯一方法是检查文件是否与本地环境的文本标准相匹配。文件(和

)大多数环境都没有文本文件的概念。


典型的例子是CP / M (以及微软的产品,它可以追溯到它的b $ b)。在那里,如果你打开一个文件作为文本写入文件,每写一个\ n

写成\\\\ n在磁盘上,当您关闭文件时,\ 032

将附加到文件的末尾。当您从文本文件中读取时,会发生

反向操作。 Windows仍然这样做。只有这样你才能区分文本文件和二进制文件才能获得这些信息,然后以二进制模式打开目标文件并且


/>
检查文件中的每个字节对isprint()或isspace()

都返回true,但文件中的最后一个字节必须等于''\ 032''。如果是这样,你知道

该文件是一个文本文件。你不需要测试文件是否是二进制的

文件,因为所有的文件都是。


在现代的多文件中它变得更加复杂字符集和

各种编码用于文本...在这种情况下,编码需要以某种方式在文件中指示

并经常假设多字节

字符集等,已经排除了它们首先被视为简单的文本文件。

Sunner Sun写道:
<posted & mailed>

There''s no "ASCII" in C. There is a somewhat artificial distinction between
"text" and "binary". "text" being a special case of a binary file whereby
the operating system might do something to the data as it is written to the
disk to make it compatible with applications that operate on text.

Since there''s no definition of what that magic might be, there''s likewise no
way to distinguish a "text" file from a "binary" file. All text files are
binary files. The only way to recognize a text file would be to check if
the file matches the local environment''s criteria for a "text" file (and
most environments don''t have the concept of a "text" file at all).

The cannonical example is CP/M (and Microsoft''s products, which harken back
to it). There, if you open a file for writing as a "text" file, every "\n"
that is written becomes "\r\n" on disk, and when you close the file, "\032"
is appended to the end of the file. When you read from the text file, the
reverse operations occur. Windows still does this. The only way you would
could differentiate between a text file and binary file would be to be
armed with this information, then open the target file in binary mode and
check that every byte in the file returns true for isprint() or isspace()
except the last byte in the file, which must equal ''\032''. If so, you know
the file is a text file. You don''t need to test if the file is a binary
file, since all files are.

It gets more complicated in modern days where multiple character sets and
various encodings are used for text... In that case, the encoding needs to
be indicated within the file somehow and that frequently presumes multibyte
character sets, etc., which already preclude them from being treated as
simple text files in the first place.
Sunner Sun wrote:
所有

由于操作系统将ASCII和二进制文件看作一个字节序列,有没有办法确定文件类型,除了判断延期?

谢谢!
Hi, all

Since the OS look both ASCII and binary file as a sequence of bytes, is
there any way to determine the file type except to judge the extension?

Thank you!




-

从地址删除.spam以通过e回复-mail。



--
remove .spam from address to reply by e-mail.


周五,2004年4月9日21:46:18 +0800,Sunner Sun <苏******** @ 163.com>写道:
On Fri, 9 Apr 2004 21:46:18 +0800, "Sunner Sun" <su********@163.com> wrote:
所有

由于操作系统将ASCII和二进制文件看作一个字节序列,是否有任何方法可以确定文件类型除了判断扩展名?


可移植,在C?不,因为二元是指二元。文件可以简单地模仿一个ASCII文件

并且没有人,或者没有/ thing /,可能会告诉你数据是否是以二进制模式或文本模式写的
。 br />

你能做的最好就是采用Unix文件的方法。命令。这里是

a样本输出我刚从Cygwin下的文本文件中运行它:


[/ home / leor]
Hi, all

Since the OS look both ASCII and binary file as a sequence of bytes, is
there any way to determine the file type except to judge the extension?

Portably, in C? Nah, because a "binary" file can simply mimic an ASCII file
and no one, or no /thing/, could possibly tell you whether the data was
written in binary mode or text mode.

The best you can do is take the approach of the Unix "file" command. Here''s
a sample output I just got from running it under Cygwin on a text file:

[/home/leor]


文件s2

s2:ASCII英文文本,带CRLF行终止符


它查看前几个字节(以及可能还有平台)特定的inode
在这种情况下是
信息)和尽最大努力。

-leor

谢谢!
file s2
s2: ASCII English text, with CRLF line terminators

It looks at the first few bytes (along with perhaps platform-specific inode
info in this case) and "takes its best shot".
-leor
Thank you!




-

Leor Zolman --- BD软件--- www.bdsoft.com

C / C ++,Java,Perl和Unix的现场培训

C ++用户:下载BD Software的免费STL错误消息解密器:
www .bdsoft.com / tools / stlfilt.html


这篇关于如何确定文件是ASCII还是二进制?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆