ç二进制文件与文本文件的效率 [英] C binary file versus text file efficiency

查看:214
本文介绍了ç二进制文件与文本文件的效率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是用C很新,我想一些帮助。
可以说,我需要在一个文件中只有6位数字存储。 (让我们假设int的长度等于4)
这将是更有效的(在内存方面)使用文本文件或二进制文件?我真的不知道如何面对这个问题,任何帮助将受到欢迎。

i'm quite new in C and i would like some help. lets say i need to store in a file only 6 digit numbers. (lets assume the size of int equals 4) what would be more efficient (in terms of memory) using a text file or binary file? i am not really sure how to confront this problem, any help will be welcome

推荐答案

大多数人归类分为两类文件:二进制文件和ASCII(文本)文件。你实际上都工作。你写任何程序(C / C ++ / Perl的/ HTML)几乎是肯定的ASCII文件。

Most people classify files in two categories: binary files and ASCII (text) files. You've actually worked with both. Any program you write (C/C++/Perl/HTML) is almost surely an ASCII file.

的ASCII文件被定义为包括ASCII字符的文件。它通常使用文本编辑器Emacs等,微微,六,记事本,等有票友编辑在那里写作code创建的,但他们可能不总是将其保存为ASCII。 ASCII是国际标准。

An ASCII file is defined as a file that consists of ASCII characters. It's usually created by using a text editor like emacs, pico, vi, Notepad, etc. There are fancier editors out there for writing code, but they may not always save it as ASCII. ASCII is international standard.

计算机科学是关于创造良好的抽象。有时成功,有时没有。良好的抽象是所有关于presenting世界,用户可以使用的视图。其中最成功的抽象概念是文本编辑器。

Computer science is all about creating good abstractions. Sometimes it succeeds and sometimes it doesn't. Good abstractions are all about presenting a view of the world that the user can use. One of the most successful abstractions is the text editor.

当你写一个程序,并在注释中键入,很难想象,这个信息没有被存储为字符。 ASCII /文本文件是真正存储为0和1。

When you're writing a program, and typing in comments, it's hard to imagine that this information is not being stored as characters. ASCII/text files are really stored as 0's and 1's.

文件存储在磁盘上,并且磁盘有一些方法来重新present 1和0。我们只是称他们为1和0的,因为这也是一种抽象。任何方式用于存储磁盘上的0和1的,我们不在乎,只要我们可以把它们的方式。

Files are stored on disks, and disks have some way to represent 1's and 0's. We merely call them 1's and 0's because that's also an abstraction. Whatever way is used to store the 0's and 1's on a disk, we don't care, provided we can think of them that way.

在效果,ASCII文件基本上都是二进制文件,因为它们存储二进制数。也就是说,ASCII文件存储0和1。

In effect, ASCII files are basically binary files, because they store binary numbers. That is, ASCII files store 0's and 1's.

ASCII和二进制文件的区别?

这是ASCII文件是存储ASCII codeS的二进制文件。回想一下,一个ASCII code是存储在一个字节7位code。更具体地,有128个不同的ASCII codeS,这意味着只有7个比特都需要重新present一个ASCII字符。

An ASCII file is a binary file that stores ASCII codes. Recall that an ASCII code is a 7-bit code stored in a byte. To be more specific, there are 128 different ASCII codes, which means that only 7 bits are needed to represent an ASCII character.

然而,由于最小可行大小是1字节,那些7位是任何字节的低7位。最显著位为0也就是说,在任何ASCII文件,就浪费了比特的1/8。特别是,在不使用每个字节的最显著位

However, since the minimum workable size is 1 byte, those 7 bits are the low 7 bits of any byte. The most significant bit is 0. That means, in any ASCII file, you're wasting 1/8 of the bits. In particular, the most significant bit of each byte is not being used.

虽然ASCII文件是二进制文件,一些人视他们为不同类型的文件。我喜欢把ASCII文件作为特殊类型的二进制文件。他们每个字节写入ASCII code二进制文件。

Although ASCII files are binary files, some people treat them as different kinds of files. I like to think of ASCII files as special kinds of binary files. They're binary files where each byte is written in ASCII code.

有一个完整的,通用的二进制文件有没有这样的限制。任何的256位模式可以在一个二进制文件的任何字节被使用。

A full, general binary file has no such restrictions. Any of the 256 bit patterns can be used in any byte of a binary file.

我们的二进制文件的工作所有的时间。可执行文件,目标文件,图像文件,声音文件,以及许多文件格式的二进制文件。是什么使他们二进制仅仅是一个事实,即一个二进制文件的每个字节可以是256位模式之一。他们不局限于ASCII codeS。
的ASCII文件示例

We work with binary files all the time. Executables, object files, image files, sound files, and many file formats are binary files. What makes them binary is merely the fact that each byte of a binary file can be one of 256 bit patterns. They're not restricted to the ASCII codes. Example of ASCII files

假设你正在编辑一个文本编辑器的文本文件。因为你使用文本编辑器,你是pretty很多编辑的ASCII文件。在这个全新的文件,你的猫型。也就是说,字母'C',然后'a',把'T'。然后,保存该文件并退出。

Suppose you're editing a text file with a text editor. Because you're using a text editor, you're pretty much editing an ASCII file. In this brand new file, you type in "cat". That is, the letters 'c', then 'a', then 't'. Then, you save the file and quit.

会发生什么?暂时,我们不会担心的是什么意思打开一个文件,修改它,并关闭它的机制。相反,我们关心的是ASCII编码。

What happens? For the time being, we won't worry about the mechanism of what it means to open a file, modify it, and close it. Instead, we're concerned with the ASCII encoding.

如果你看到了一个ASCII表,你会发现在ASCII code为0x63,0x61,0x74(在0X仅仅表明的值是十六进制,而不是十进制/基10)。

If you look up an ASCII table, you will discover the ASCII code for 0x63, 0x61, 0x74 (the 0x merely indicates the values are in hexadecimal, instead of decimal/base 10).

Here's how it looks:
ASCII   'c'        'a'          't'
Hex     63          61          74
Binary  0110 0011   0110 0001   0111 1000

每个你在一个ASCII字符输入和保存时间,一个完整的字节写入对应于该字符。这包括标点符号,空格,等等。

Each time you type in an ASCII character and save it, an entire byte is written which corresponds to that character. This includes punctuations, spaces, and so forth.

因此​​,当你键入'C',它被保存为0110 0011到文件中。

Thus, when you type a 'c', it's being saved as 0110 0011 to a file.

现在有时一个文本编辑器抛出你可能没有想到的字符。例如,一些编辑器坚持,每行以换行符结束。

Now sometimes a text editor throws in characters you may not expect. For example, some editors "insist" that each line end with a newline character.

一个文件可以被丢失在该行的末尾的换行符的唯一位置是在最后一行。一些编辑器允许最后一行在结束的东西,除了一个换行符。有些编辑在每个文件的末尾添加一个新行。

The only place a file can be missing a newline at the end of the line is the very last line. Some editors allow the very last line to end in something besides a newline character. Some editors add a newline at the end of every file.

不幸的是,即使是换行字符是不是普遍的标准。这是通用在UNIX文件使用换行符,但在Windows,它通常使用两个字符来结束每行(回车符,换行符,这是\\ r和\\ n我相信)。为什么是两个字符的时候只有一个是必要的?

Unfortunately, even the newline character is not that universally standard. It's common to use newline characters on UNIX files, but in Windows, it's common to use two characters to end each line (carriage return, newline, which is \r and \n, I believe). Why two characters when only one is necessary?

这可以追溯到打印机。在过去,花了一台打印机的时候返回到一行的开头等于花了输入两个字符的时间。这样,两个字符被放置在该文件中,得到在打印机时间移动打印机球回行的开头。

This dates back to printers. In the old days, the time it took for a printer to return back to the beginning of a line was equal to the time it took to type two characters. So, two characters were placed in the file to give the printer time to move the printer ball back to the beginning of the line.

这事实并不那么重要。这主要是琐事。我把它的原因是以防万一你想知道为什么文件传输到UNIX从Windows有时会产生奇怪的字符。
编辑二进制文件
现在,你知道,在一个ASCII文件输入的每个字符对应于一个文件一个字节,你可能会明白为什么它很难编辑二进制文件。

This fact isn't all that important. It's mostly trivia. The reason I bring it up is just in case you've wondered why transferring files to UNIX from Windows sometimes generates funny characters. Editing Binary Files Now that you know that each character typed in an ASCII file corresponds to one byte in a file, you might understand why it's difficult to edit a binary file.

如果您希望编辑二进制文件,你真的想编辑各个位。例如,假设你想要写的二进制码1100 0011。你会怎么做呢?

If you want to edit a binary file, you really would like to edit individual bits. For example, suppose you want to write the binary pattern 1100 0011. How would you do this?

您可能会懂事,键入一个文件中的以下内容:

You might be naive, and type in the following in a file:

11000011

但是你应该知道,现在,这是不是在编辑文件的各个位。如果在1和0型,你真的进入0x49和0x48。也就是说,你在进入0100 1001 0100和1000到文件中。你实际上是(间接)一次输入8位。

But you should know, by now, that this is not editing individual bits of a file. If you type in '1' and '0', you are really entering in 0x49 and 0x48. That is, you're entering in 0100 1001 and 0100 1000 into the files. You're actually (indirectly) typing 8 bits at a time.

有一些程序,让您在49输入,并将其转换这一个字节,0100 1001,而不是ASCII code为4和9。您可以拨打这些程序十六进制编辑器。不幸的是,这些可能不那么容易获得。这不是太难写一个程序,在ASCII文件,看起来像十六进制对读,但随后将其转换为与相应的位模式的真正的二进制文件。

There are some programs that allow you type in 49, and it translates this to a single byte, 0100 1001, instead of the ASCII code for '4' and '9'. You can call these programs hex editors. Unfortunately, these may not be so readily available. It's not too hard to write a program that reads in an ASCII file that looks like hex pairs, but then converts it to a true binary file with the corresponding bit patterns.

也就是说,它需要看起来像一个文件:

That is, it takes a file that looks like:

63 a0 de

和转换这个ASCII文件开头0110 0011(其为二进制63)的二进制文件。请注意,此文件是ASCII,这意味着什么是真正存储的是ASCII code为'6','3',''(空格),'A','0',等等。程序可以读取该ASCII文件然后生成适当的二进制code和写一个文件。

and converts this ASCII file to a binary file that begins 0110 0011 (which is 63 in binary). Notice that this file is ASCII, which means what's really stored is the ASCII code for '6', '3', ' ' (space), 'a', '0', and so forth. A program can read this ASCII file then generate the appropriate binary code and write that to a file.

因此​​,ASCII文件可能包含8个字节(6为人物,2为空格),并输出二进制文件将包含3个字节,每格对一个字节。

Thus, the ASCII file might contain 8 bytes (6 for the characters, 2 for the spaces), and the output binary file would contain 3 bytes, one byte per hex pair.

写作二进制文件

为什么人们仍要使用二进制文件?其中一个原因是紧凑性。例如,假设你想如果你在ASCII键入它,这将需要6个字符(这是6个字节)写号100000。不过,如果你再present它作为无符号二进制,则可以使用4个字节写出来的。

Why do people use binary files anyway? One reason is compactness. For example, suppose you wanted to write the number 100000. If you type it in ASCII, this would take 6 characters (which is 6 bytes). However, if you represent it as unsigned binary, you can write it out using 4 bytes.

ASCII是方便的,因为它往往是人类可读的,但它可以占用大量的空间。您可以通过使用二进制文件更紧凑重新present信息。

ASCII is convenient, because it tends to be human-readable, but it can use up a lot of space. You can represent information more compactly by using binary files.

例如,有一件事你可以做的是一个对象保存到文件中。这是一种序列化。要将其记录到文件中,您可以使用write()方法。通常,在一个指向对象和字节用于重新present对象(使用sizeof运算符来确定此)的数量来写()方法通过。然后,该方法转储出字节,因为它出现在内存到文件中。

For example, one thing you can do is to save an object to a file. This is a kind of serialization. To dump it to a file, you use a write() method. Usually, you pass in a pointer to the object and the number of bytes used to represent the object (use the sizeof operator to determine this) to the write() method. The method then dumps out the bytes as it appears in memory into a file.

可以然后从文件恢复信息,并将其放置到对象通过使用一个相应的读()方法,该方法通常需要一个指针的对象(和它应指向已分配的内存的对象,无论是静态或动态分配的)并用于对象的字节数,并复制字节从文件到对象

You can then recover the information from the file and place it into the object by using a corresponding read() method which typically takes a pointer to an object (and it should point to an object that has memory allocated, whether it be statically or dynamically allocated) and the number of bytes for the object, and copies the bytes from the file into the object.

当然,你一定要小心。如果使用两个不同的编译器,或将文件从一种机器到另一个的转移,这个过程可能不起作用。特别是,该对象可被不同地布置。这可以是作为字节顺序一样简单,或者可以有与填充的问题。

Of course, you must be careful. If you use two different compilers, or transfer the file from one kind of machine to another, this process may not work. In particular, the object may be laid out differently. This can be as simple as endianness, or there may be issues with padding.

对象保存到一个文件的这种方式是好的和简单,但它可能不是所有的便携。此外,它的一个浅拷贝等同。如果你的对象包含指针,它会写出来的地址到文件中。这些地址可能是完全没有意义的。地址可以在程序运行时是有意义的,但如果你退出并重新启动,这些地址可能会改变。

This way of saving objects to a file is nice and simple, but it may not be all that portable. Furthermore, it does the equivalent of a shallow copy. If your object contains pointers, it will write out the addresses to the file. Those addresses are likely to be totally meaningless. Addresses may make sense at the time a program is running, but if you quit and restart, those addresses may change.

这就是为什么有些人发明用于存储对象​​自己的格式:提高便携性。

This is why some people invent their own format for storing objects: to increase portability.

但如果你知道你是不是存储包含指针的对象,您正在阅读的文件,在同类型的电脑系统你写它,并且正在使用相同的编译器,它应该工作。

But if you know you aren't storing objects that contain pointers, and you are reading the file in on the same kind of computer system you wrote it on, and you're using the same compiler, it should work.

这是一个原因,人们有时preFER写出来整数,字符等,而不是整个对象。他们往往是较为便携。

This is one reason people sometimes prefer to write out ints, chars, etc. instead of entire objects. They tend to be somewhat more portable.

的ASCII文件是由ASCII字符的二进制文件。 ASCII字符是存储在字节7位编码。因此,ASCII文件的每个字节都有其最显著位设置为0。思考一个ASCII文件的作为一种特殊的二进制文件的

An ASCII file is a binary file that consists of ASCII characters. ASCII characters are 7-bit encodings stored in a byte. Thus, each byte of an ASCII file has its most significant bit set to 0. Think of an ASCII file as a special kind of binary file.

一个通用的二进制文件使用的所有8位。一个二进制文件的每个字节可以具有完整的256位串的模式(而不是一个ASCII文件,该文件仅具有128比特串的模式)。

A generic binary file uses all 8-bits. Each byte of a binary file can have the full 256 bitstring patterns (as opposed to an ASCII file which only has 128 bitstring patterns).

有可能在那里的Uni code文本文件变得更prevalent时间。但现在,ASCII文件是文本文件的标准格式。

There may be a time where Unicode text files becomes more prevalent. But for now, ASCII files are the standard format for text files.

这篇关于ç二进制文件与文本文件的效率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆