默认情况下,C ++中的宽文件流为什么会缩小写入数据? [英] Why does wide file-stream in C++ narrow written data by default?

查看:140
本文介绍了默认情况下,C ++中的宽文件流为什么会缩小写入数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

老实说,我在C ++标准库中没有得到以下设计决策。将宽字符写入文件时, wofstream wchar_t 转换为 char character:

  #include< fstream> 
#include< string>

int main()
{
using namespace std;

wstring someString = LHello StackOverflow!;
wofstream文件(LTest.txt);

文件<< someString; //输出文件将由ASCII字符组成!
}

我知道这与标准的的codecvt utf8 中有 codecvt /1_40_0/libs/serialization/doc/codecvt.htmlrel =nofollow noreferrer> Boost 。另外, codecvt utf16 207662 / writing-utf16-to-file-in-binary-mode / 208431#208431> Martin York这里是SO。问题是为什么 标准codecvt 转换宽字符?为什么不写字符呢?



另外,我们要用C ++实现真正的 unicode stream 0x或者我在这里缺少某些东西?

解决方案

C ++用于字符集的模型是从C继承的,所以可追溯到至少1989年。



两个要点:




  • IO完成了的char。

  • 它是区域设置的工作,以确定多少个字符串行化

  • 默认语言环境(称为C)非常最小(我不记得来自标准的限制,在这里它只能处理只有7位ASCII的窄和宽字符集)。

  • 有一个环境确定的区域设置命名



所以要得到任何东西,你必须设置区域设置。



如果我使用简单的程序

  #include< locale> 
#include< fstream>
#include< ostream>
#include< iostream>

int main()
{
wchar_t c = 0x00FF;
std :: locale :: global(std :: locale());
std :: wofstream os(test.dat);
os<< c <的std :: ENDL;
if(!os){
std :: cout<<< 输出失败
}
}

使用环境语言环境并输出宽字符代码0x00FF到文件。如果我要求使用C区域,我将获得

  $ env LC_ALL = C ./a.out 
输出失败

语言环境无法处理宽字符,我们收到问题的通知IO失败。如果我运行一个UTF-8语言环境,我得到

  $ env LC_ALL = en_US.utf8 ./a.out 
$ od -t x1 test.dat
0000000 c3 bf 0a
0000003

(od -t xx只是转储以十六进制表示的文件),正是我对UTF-8编码文件的期望。


Honestly, I just don't get the following design decision in C++ Standard library. When writing wide characters to a file, the wofstream converts wchar_t into char characters:

#include <fstream>
#include <string>

int main()
{
    using namespace std;

    wstring someString = L"Hello StackOverflow!";
    wofstream file(L"Test.txt");

    file << someString; // the output file will consist of ASCII characters!
}

I am aware that this has to do with the standard codecvt. There is codecvt for utf8 in Boost. Also, there is a codecvt for utf16 by Martin York here on SO. The question is why the standard codecvt converts wide-characters? why not write the characters as they are!

Also, are we gonna get real unicode streams with C++0x or am I missing something here?

解决方案

The model used by C++ for charsets is inherited from C, and so dates back to at least 1989.

Two main points:

  • IO is done in term of char.
  • it is the job of the locale to determine how wide chars are serialized
  • the default locale (named "C") is very minimal (I don't remember the constraints from the standard, here it is able to handle only 7-bit ASCII as narrow and wide character set).
  • there is an environment determined locale named ""

So to get anything, you have to set the locale.

If I use the simple program

#include <locale>
#include <fstream>
#include <ostream>
#include <iostream>

int main()
{
    wchar_t c = 0x00FF;
    std::locale::global(std::locale(""));
    std::wofstream os("test.dat");
    os << c << std::endl;
    if (!os) {
        std::cout << "Output failed\n";
    }
}

which use the environment locale and output the wide character of code 0x00FF to a file. If I ask to use the "C" locale, I get

$ env LC_ALL=C ./a.out
Output failed

the locale has been unable to handle the wide character and we get notified of the problem as the IO failed. If I run ask an UTF-8 locale, I get

$ env LC_ALL=en_US.utf8 ./a.out
$ od -t x1 test.dat
0000000 c3 bf 0a
0000003

(od -t x1 just dump the file represented in hex), exactly what I expect for an UTF-8 encoded file.

这篇关于默认情况下,C ++中的宽文件流为什么会缩小写入数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆