默认情况下,C ++中的宽文件流为什么会缩小写入数据? [英] Why does wide file-stream in C++ narrow written data by default?
问题描述
wofstream
将 wchar_t
转换为 char
character: #include< fstream>
#include< string>
int main()
{
using namespace std;
wstring someString = LHello StackOverflow!;
wofstream文件(LTest.txt);
文件<< someString; //输出文件将由ASCII字符组成!
}
我知道这与标准的的codecvt
。 utf8 中有 codecvt
/1_40_0/libs/serialization/doc/codecvt.htmlrel =nofollow noreferrer> Boost
。另外, codecvt utf16
207662 / writing-utf16-to-file-in-binary-mode / 208431#208431> Martin York这里是SO。问题是为什么 标准codecvt
转换宽字符?为什么不写字符呢?
另外,我们要用C ++实现真正的 unicode stream
0x或者我在这里缺少某些东西?
C ++用于字符集的模型是从C继承的,所以可追溯到至少1989年。
两个要点:
- IO完成了的char。
- 它是区域设置的工作,以确定多少个字符串行化
- 默认语言环境(称为C)非常最小(我不记得来自标准的限制,在这里它只能处理只有7位ASCII的窄和宽字符集)。
- 有一个环境确定的区域设置命名
所以要得到任何东西,你必须设置区域设置。
如果我使用简单的程序
#include< locale>
#include< fstream>
#include< ostream>
#include< iostream>
int main()
{
wchar_t c = 0x00FF;
std :: locale :: global(std :: locale());
std :: wofstream os(test.dat);
os<< c <的std :: ENDL;
if(!os){
std :: cout<<< 输出失败
}
}
使用环境语言环境并输出宽字符代码0x00FF到文件。如果我要求使用C区域,我将获得
$ env LC_ALL = C ./a.out
输出失败
语言环境无法处理宽字符,我们收到问题的通知IO失败。如果我运行一个UTF-8语言环境,我得到
$ env LC_ALL = en_US.utf8 ./a.out
$ od -t x1 test.dat
0000000 c3 bf 0a
0000003
(od -t xx只是转储以十六进制表示的文件),正是我对UTF-8编码文件的期望。
Honestly, I just don't get the following design decision in C++ Standard library. When writing wide characters to a file, the wofstream
converts wchar_t
into char
characters:
#include <fstream>
#include <string>
int main()
{
using namespace std;
wstring someString = L"Hello StackOverflow!";
wofstream file(L"Test.txt");
file << someString; // the output file will consist of ASCII characters!
}
I am aware that this has to do with the standard codecvt
. There is codecvt
for utf8
in Boost
. Also, there is a codecvt
for utf16
by Martin York here on SO. The question is why the standard codecvt
converts wide-characters? why not write the characters as they are!
Also, are we gonna get real unicode streams
with C++0x or am I missing something here?
The model used by C++ for charsets is inherited from C, and so dates back to at least 1989.
Two main points:
- IO is done in term of char.
- it is the job of the locale to determine how wide chars are serialized
- the default locale (named "C") is very minimal (I don't remember the constraints from the standard, here it is able to handle only 7-bit ASCII as narrow and wide character set).
- there is an environment determined locale named ""
So to get anything, you have to set the locale.
If I use the simple program
#include <locale>
#include <fstream>
#include <ostream>
#include <iostream>
int main()
{
wchar_t c = 0x00FF;
std::locale::global(std::locale(""));
std::wofstream os("test.dat");
os << c << std::endl;
if (!os) {
std::cout << "Output failed\n";
}
}
which use the environment locale and output the wide character of code 0x00FF to a file. If I ask to use the "C" locale, I get
$ env LC_ALL=C ./a.out
Output failed
the locale has been unable to handle the wide character and we get notified of the problem as the IO failed. If I run ask an UTF-8 locale, I get
$ env LC_ALL=en_US.utf8 ./a.out
$ od -t x1 test.dat
0000000 c3 bf 0a
0000003
(od -t x1 just dump the file represented in hex), exactly what I expect for an UTF-8 encoded file.
这篇关于默认情况下,C ++中的宽文件流为什么会缩小写入数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!