插入器和提取器读取/写入二进制数据与文本 [英] Inserters and Extractors reading/writing binary data vs text

查看:446
本文介绍了插入器和提取器读取/写入二进制数据与文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试阅读iostreams并更好地理解它们。偶尔,我发现插入器(< )和提取器(>> 用于文本序列化。这是几个地方,但本文是一个很好的例子:



http:/ /spec.winprog.org/streams/



< iostream> 存在其中<<和>>以流的方式使用,但不遵守任何文本约定。例如,当Qt的 QDataStream 使用时,他们写二进制编码数据:



http://doc.qt.nokia.com/latest/qdatastream.html#details



在语言层面,<<和>>运算符属于你的项目重载(因此什么QDataStream是明确可接受的)。我的问题是,对于那些使用< iostream> 使用<和>>运算符来实现二进制编码和解码。例如,是否有任何期望,如果写入磁盘上的文件,该文件应该可以查看,并可以使用文本编辑器进行编辑?



应该总是使用其他方法名,并将它们基于 read() write()?或者应该将文本编码仅仅视为默认行为,与标准库iostream集成的类可以选择忽略?






strong> UPDATE 这一关键术语问题似乎是格式化与未格式化(而不是术语文本与二进制)的I / O的区别。我发现这个问题:



将二进制数据(std :: string)写入std :: ofstream?



它有一个来自@ TomalakGeret'kal的注释, 我不想使用<<二进制数据反正,因为我的大脑读取它作为格式化输出,这不是你在做什么。再次,它是完全有效的,但我只是不会混淆我的大脑



接受的问题答案说,只要使用 ios :: binary 。这似乎加强了没有什么错误的辩论...但我仍然没有看到任何权威来源的问题。

解决方案

实际上,运算符<< >> ;使用它们进行I / O严格来说已经是一种误用。然而,这种误用与操作符重载本身一样古老,I / O今天是它们最常见的用法,因此它们被广泛认为是I / O插入/提取运算符。我很确定如果没有iostreams的先例,没有人会使用这些运算符的I / O(特别是有C ++ 11有可变参数模板,解决主要问题,使用这些运算符解决iostreams,在更清洁的方式)。另一方面,从语言的角度来看,重载运算符<< operator>>



因此,问题归结为什么是可接受使用这些操作符。为此,我认为有一个必须区分两种情况:第一,新的重载工作在iostream类,第二,新的重载工作在其他类,可能设计工作像iostreams。



让我们考虑iostream类的第一个新运算符。让我开始观察iostream类都是关于格式化(和相反的过程,可以被称为去格式化;lexingIMHO在这里不会是完全正确的术语,因为提取器不确定类型,但只尝试根据给定的类型解释数据)。负责原始数据的实际I / O的类是streambuf。但请注意,一个合适的二进制文件是不是一个文件,你只是转储内部原始数据。就像一个文本文件(实际上更是如此),一个二进制文件应该有一个明确指定的数据的编码。特别是如果文件预期在不同的系统上读取。因此,格式化输出的概念对于二进制文件也是非常有意义的;只是格式化是不同的(例如,对于整数值,写入预定数量的字节,其中最重要的字节首先为整数值)。



iostreams本身是预期的类以处理文本文件,即对其内容被解释为数据的文本表示的文件。许多内置的行为是为此优化,如果使用二进制文件可能会导致问题。一个明显的例子是,默认在尝试任何输入之前跳过空格。对于二进制文件,这显然是错误的行为。此外,使用locales对二进制文件没有意义(虽然可能会认为可能有一个二进制语言环境,但我不认为为iostreams定义的语言环境提供一个合适的接口)。因此,我想说,为iostream类编写二进制运算符<< operator>> / p>

另一种情况是你为二进制输入/输出定义一个单独的类(可能重用streambuf层来做实际的I / O)。由于我们现在谈论不同的类,上面的论证不再适用。所以现在的问题是:在I / O上的运算符<< operator>> 文本插入/提取运算符或更一般地称为格式化数据插入/提取运算符?标准类只使用它们作为文本,但是没有标准的类用于二进制I / O插入/提取,所以标准用法不能区分两者。



我个人会说,二进制插入/提取足够接近文本插入/提取,这种用法是合理的。注意,你也可以做出有意义的二进制I / O操纵器,例如。 bigendian littleendian intwidth(n)



除此之外,对于那些不是真正的I / O的东西,我们还可以使用这些操作符想到使用streambuf层),喜欢从容器读取或插入容器。在我看来,这已经构成滥用运营商,因为数据没有翻译成或不同的格式。它只是存储在容器中。


I've been trying to read up on iostreams and understand them better. Occasionally I find it stressed that inserters (<<) and extractors (>>) are meant to be used in textual serialization. It's a few places, but this article is a good example:

http://spec.winprog.org/streams/

Outside of the <iostream> universe, there are cases where the << and >> are used in a stream-like way yet do not obey any textual convention. For instance, they write binary encoded data when used by Qt's QDataStream:

http://doc.qt.nokia.com/latest/qdatastream.html#details

At the language level, the << and >> operators belong to your project to overload (hence what QDataStream does is clearly acceptable). My question would be whether it is considered a bad practice for those using <iostream> to use the << and >> operators to implement binary encodings and decodings. Is there (for instance) any expectation that if written to a file on disk that the file should be viewable and editable with a text editor?

Should one always be using other method names and base them on read() and write()? Or should textual encodings be considered merely a default behavior that classes integrating with the standard library iostream can elect to ignore?


UPDATE A key terminology issue on this seems to be the distinction of I/O that is "formatted" vs "unformatted" (as opposed to the terms "textual" vs "binary"). I found this question:

writing binary data (std::string) to an std::ofstream?

It has a comment from @TomalakGeret'kal saying "I'd not want to use << for binary data anyway, as my brain reads it as "formatted output" which is not what you're doing. Again, it's perfectly valid, but I just would not confuse my brain like that."

The accepted answer to the question says it's fine as long as you use ios::binary. That seems to bolster the "there's nothing wrong with it" side of the debate...but I still don't see any authoritative source on the issue.

解决方案

Actually the operators << and >> are bit shift operators; using them for I/O is strictly speaking already a misuse. However that misuse is about as old as operator overloading itself, and I/O today is the most common usage of them, therefore they are widely regarded as I/O insertion/extraction operators. I'm pretty sure if there weren't the precedent of iostreams, nobody would use those operators for I/O (especially with C++11 which has variadic templates, solving the main problem which using those operators solved for iostreams, in a much cleaner way). On the other hand, from the language point of view, overloaded operator<< and operator>> can mean whatever you want them to mean.

So the question boils down to what would be an acceptable use of those operators. For this, I think one has to distinguish two cases: First, new overloads working on iostream classes, and second, new overloads working on other classes, possibly designed to work like iostreams.

Let's consider first new operators on iostream classes. Let me start with the observation that the iostream classes are all about formatting (and the reverse process, which could be called "deformatting"; "lexing" IMHO wouldn't be quite the right term here because the extractors don't determine the type, but only try to interpret the data according to the type given). The classes responsible for the actual I/O of raw data are the streambufs. However note that a proper binary file is not a file where you just dump internal raw data. Just like a text file (actually even more so), a binary file should have a well-specified encoding of the data it contains. Especially if the files are expected to be read on different systems. Therefore the concept of formatted output makes perfect sense also for binary files; just the formatting is different (e.g. writing a pre-determined number of bytes with the most significant one first for an integer value).

The iostreams themselves are classes which are intended to work on text files, that is, on files whose content is interpreted as textual representation of data. A lot of built-in behaviour is optimized for that, and may cause problems if used on binary files. An obvious example is that by default spaces are skipped before any input is attempted. For a binary file, this would be clearly the wrong behaviour. Also the use of locales doesn't make sense for binary files (although one might argue that there could be a "binary locale", but I don't think locales as defined for iostreams provide a suitable interface for that). Therefore I'd say writing binary operator<< or operator>> for iostream classes would be wrong.

The other case is where you define a separate class for binary input/output (possibly reusing the streambuf layer for doing the actual I/O). Since we are now speaking about different classes, the argumentation above doesn't apply any more. So the question now is: Should operator<< and operator>> on I/O be regarded as "text insertion/extraction operators" or more generally as "formatted data insertion/extraction operators"? The standard classes only use them for text, but then, there are no standard classes for binary I/O insertion/extraction at all, so the standard usage cannot distinguish between the two.

I personally would say that binary insertion/extraction is close enough to textual insertion/extraction that this usage is justified. Note that you also could make meaningful binary I/O manipulators, e.g. bigendian, littleendian and intwidth(n) to determine the format in which integers are to be output.

Beyond that there's also the use of those operators for things which are not really I/O (and where you wouldn't even think of using the streambuf layer), like reading from or inserting into a container. In my opinion, that already constitutes misuse of the operators, because there the data isn't translated into or out of a different format. It is just stored in a container.

这篇关于插入器和提取器读取/写入二进制数据与文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆