为什么C ++中的宽文件流默认缩写写入的数据? [英] Why does wide file-stream in C++ narrow written data by default?
问题描述
wofstream
将 wchar_t
转换为 char
个字符: #include< fstream&
#include< string>
int main()
{
using namespace std;
wstring someString = LHello StackOverflow!
wofstream file(LTest.txt);
file<< someString; //输出文件将由ASCII字符组成!
}
我知道这与标准 codecvt
。在 codecvt for utf8
/1%5F40%5F0/libs/serialization/doc/codecvt.html> Boost
。此外,由 utf16 还有一个 codecvt
207662 / writing-utf16-to-file-in-binary-mode / 208431#208431>马丁约克在这里SO。问题是为什么 标准codecvt
转换宽字符?为什么不写字符,因为他们是!
此外,我们要获得真正的 unicode流
0x或am我在这里缺少一些东西?
C ++使用的字符集模型继承自C,至少1989年。
两个要点:
- 这是区域设置的作业,用于确定字符序列化的宽度。
- 默认语言环境最小(我不记得标准中的约束,这里它只能处理7位ASCII作为窄和宽字符集)。
- 有一个环境确定的locale命名
因此,要获取任何内容,您必须设置区域设置。
如果我使用简单的程序
#include< locale>
#include< fstream>
#include< ostream>
#include< iostream>
int main()
{
wchar_t c = 0x00FF;
std :: locale :: global(std :: locale());
std :: wofstream os(test.dat);
os<< c<< std :: endl;
if(!os){
std :: cout<< 输出失败\\\
;
}
}
使用环境语言环境并输出宽字符代码0x00FF到文件。如果我要求使用C语言环境,我会得到
$ env LC_ALL = C ./a.out
输出失败
语言环境无法处理宽字符,我们收到问题的通知IO失败。如果我运行请求UTF-8语言环境,我会得到
$ env LC_ALL = en_US.utf8 ./a.out
$ od -t x1 test.dat
0000000 c3 bf 0a
0000003
(od -t x1只是转储以十六进制表示的文件),正是我期望的UTF-8编码文件。
Honestly, I just don't get the following design decision in C++ Standard library. When writing wide characters to a file, the wofstream
converts wchar_t
into char
characters:
#include <fstream>
#include <string>
int main()
{
using namespace std;
wstring someString = L"Hello StackOverflow!";
wofstream file(L"Test.txt");
file << someString; // the output file will consist of ASCII characters!
}
I am aware that this has to do with the standard codecvt
. There is codecvt
for utf8
in Boost
. Also, there is a codecvt
for utf16
by Martin York here on SO. The question is why the standard codecvt
converts wide-characters? why not write the characters as they are!
Also, are we gonna get real unicode streams
with C++0x or am I missing something here?
The model used by C++ for charsets is inherited from C, and so dates back to at least 1989.
Two main points:
- IO is done in term of char.
- it is the job of the locale to determine how wide chars are serialized
- the default locale (named "C") is very minimal (I don't remember the constraints from the standard, here it is able to handle only 7-bit ASCII as narrow and wide character set).
- there is an environment determined locale named ""
So to get anything, you have to set the locale.
If I use the simple program
#include <locale>
#include <fstream>
#include <ostream>
#include <iostream>
int main()
{
wchar_t c = 0x00FF;
std::locale::global(std::locale(""));
std::wofstream os("test.dat");
os << c << std::endl;
if (!os) {
std::cout << "Output failed\n";
}
}
which use the environment locale and output the wide character of code 0x00FF to a file. If I ask to use the "C" locale, I get
$ env LC_ALL=C ./a.out
Output failed
the locale has been unable to handle the wide character and we get notified of the problem as the IO failed. If I run ask an UTF-8 locale, I get
$ env LC_ALL=en_US.utf8 ./a.out
$ od -t x1 test.dat
0000000 c3 bf 0a
0000003
(od -t x1 just dump the file represented in hex), exactly what I expect for an UTF-8 encoded file.
这篇关于为什么C ++中的宽文件流默认缩写写入的数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!