为什么C ++中的宽文件流默认缩写写入的数据? [英] Why does wide file-stream in C++ narrow written data by default?

查看:103
本文介绍了为什么C ++中的宽文件流默认缩写写入的数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

老实说,我只是不能在C ++标准库中得到以下设计决定。当将宽字符写入文件时, wofstream wchar_t 转换为 char 个字符:

  #include< fstream& 
#include< string>

int main()
{
using namespace std;

wstring someString = LHello StackOverflow!
wofstream file(LTest.txt);

file<< someString; //输出文件将由ASCII字符组成!
}



我知道这与标准 codecvt 。在 codecvt for utf8 /1%5F40%5F0/libs/serialization/doc/codecvt.html> Boost 。此外,由 utf16 还有一个 codecvt 207662 / writing-utf16-to-file-in-binary-mode / 208431#208431>马丁约克在这里SO。问题是为什么 标准codecvt 转换宽字符?为什么不写字符,因为他们是!



此外,我们要获得真正的 unicode流 0x或am我在这里缺少一些东西?

解决方案

C ++使用的字符集模型继承自C,至少1989年。



两个要点:





  • 这是区域设置的作业,用于确定字符序列化的宽度。

  • 默认语言环境最小(我不记得标准中的约束,这里它只能处理7位ASCII作为窄和宽字符集)。

  • 有一个环境确定的locale命名



因此,要获取任何内容,您必须设置区域设置。



如果我使用简单的程序

  #include< locale> 
#include< fstream>
#include< ostream>
#include< iostream>

int main()
{
wchar_t c = 0x00FF;
std :: locale :: global(std :: locale());
std :: wofstream os(test.dat);
os<< c<< std :: endl;
if(!os){
std :: cout<< 输出失败\\\
;
}
}

使用环境语言环境并输出宽字符代码0x00FF到文件。如果我要求使用C语言环境,我会得到

  $ env LC_ALL = C ./a.out 
输出失败

语言环境无法处理宽字符,我们收到问题的通知IO失败。如果我运行请求UTF-8语言环境,我会得到

  $ env LC_ALL = en_US.utf8 ./a.out 
$ od -t x1 test.dat
0000000 c3 bf 0a
0000003

(od -t x1只是转储以十六进制表示的文件),正是我期望的UTF-8编码文件。


Honestly, I just don't get the following design decision in C++ Standard library. When writing wide characters to a file, the wofstream converts wchar_t into char characters:

#include <fstream>
#include <string>

int main()
{
    using namespace std;

    wstring someString = L"Hello StackOverflow!";
    wofstream file(L"Test.txt");

    file << someString; // the output file will consist of ASCII characters!
}

I am aware that this has to do with the standard codecvt. There is codecvt for utf8 in Boost. Also, there is a codecvt for utf16 by Martin York here on SO. The question is why the standard codecvt converts wide-characters? why not write the characters as they are!

Also, are we gonna get real unicode streams with C++0x or am I missing something here?

解决方案

The model used by C++ for charsets is inherited from C, and so dates back to at least 1989.

Two main points:

  • IO is done in term of char.
  • it is the job of the locale to determine how wide chars are serialized
  • the default locale (named "C") is very minimal (I don't remember the constraints from the standard, here it is able to handle only 7-bit ASCII as narrow and wide character set).
  • there is an environment determined locale named ""

So to get anything, you have to set the locale.

If I use the simple program

#include <locale>
#include <fstream>
#include <ostream>
#include <iostream>

int main()
{
    wchar_t c = 0x00FF;
    std::locale::global(std::locale(""));
    std::wofstream os("test.dat");
    os << c << std::endl;
    if (!os) {
        std::cout << "Output failed\n";
    }
}

which use the environment locale and output the wide character of code 0x00FF to a file. If I ask to use the "C" locale, I get

$ env LC_ALL=C ./a.out
Output failed

the locale has been unable to handle the wide character and we get notified of the problem as the IO failed. If I run ask an UTF-8 locale, I get

$ env LC_ALL=en_US.utf8 ./a.out
$ od -t x1 test.dat
0000000 c3 bf 0a
0000003

(od -t x1 just dump the file represented in hex), exactly what I expect for an UTF-8 encoded file.

这篇关于为什么C ++中的宽文件流默认缩写写入的数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆