does(w)ifstream支持不同的编码 [英] does (w)ifstream support different encodings
问题描述
当我使用wifstream读取一个文本文件到一个宽字符串(std :: wstring)时,流实现支持不同的编码,即它可以用于读取。 ASCII,UTF-8和UTF-16文件?
When I read a text file to a wide character string (std::wstring) using an wifstream, does the stream implementation support different encodings - i.e. can it be used to read e.g. ASCII, UTF-8, and UTF-16 files?
如果没有,我该怎么办?
If not, what would I have to do?
(我需要读取整个文件,如果这会产生影响)
(I need to read the entire file, if that makes a difference)
推荐答案
C ++支持字符编码, code> std :: locale 和构面 std :: codecvt
。一般的想法是 locale
对象描述系统的各个方面,从文化,(人类)语言到语言可能有所不同。这些方面分解为 facet
s,它们是定义如何构建本地化依赖对象(包括I / O流)的模板参数。当从 istream
中读取或写入 ostream
时,每个字符的实际写入通过语言环境的facets 。这些方面不仅涵盖Unicode类型的编码,还包括如何编写大量数字(例如使用逗号或句点),货币,时间,大写和其他细节的各种功能。
C++ supports character encodings by means of std::locale
and the facet std::codecvt
. The general idea is that a locale
object describes the aspects of the system that might vary from culture to culture, (human) language to language. These aspects are broken down into facet
s, which are template arguments that define how localization-dependent objects (include I/O streams) are constructed. When you read from an istream
or write to a ostream
, the actual writing of each character is filtered through the locale's facets. The facets cover not only encoding of Unicode types but such varied features as how large numbers are written (e.g. with commas or periods), currency, time, capitalization, and a slew of other details.
但是只是因为设施存在编码并不意味着标准库实际上处理所有编码,也不会使这样的代码简单的做。即使是你应该读入的字符大小(更不用说编码部分)这样的基本事情是很困难的,因为 wchar_t
可能太小(调整你的数据),或者太大(浪费空间),最常见的编译器(例如Visual C ++和Gnu C ++)不同的实现方式有多大。因此,您通常需要查找外部库来进行实际编码。
However just because the facilities exist to do encodings doesn't mean the standard library actually handles all encodings, nor does it make such code simple to do right. Even such basic things as the size of character you should be reading into (let alone the encoding part) is difficult, as wchar_t
can be too small (mangling your data), or too large (wasting space), and the most common compilers (e.g. Visual C++ and Gnu C++) do differ on how big their implementation is. So you generally need to find external libraries to do the actual encoding.
- iconv 一般认为是正确的,但是如何将它绑定到C ++机制的例子很难找到。
- jla3ep提及 libICU ,这是非常彻底的,但 C ++ API 不试图与标准玩得很好(据我所知:您可以扫描示例,看看你能做得更好。)
- iconv is generally acknowledge to be correct, but examples of how to bind it to the C++ mechanism are hard to find.
- jla3ep mentions libICU, which is very thorough but the C++ API does not try to play nicely with the standard (As far as I can tell: you can scan the examples to see if you can do better.)
最简单的例子,我可以找到涵盖所有的基础,是从Boost的 UTF-8 codecvt facet ,其中一个示例特别尝试编码UTF-8(UCS4)以供IO流使用。它看起来像这样,虽然我不建议只是复制它逐字。它需要更多地挖掘源了解它(我不声称):
The most straightforward example I can find that covers all the bases, is from Boost's UTF-8 codecvt facet, with an example that specifically tries to encode UTF-8 (UCS4) for use by IO streams. It looks like this, though I don't suggest just copying it verbatim. It takes a little more digging in the source to understand it (and I don't claim to):
typedef wchar_t ucs4_t;
std::locale old_locale;
std::locale utf8_locale(old_locale,new utf8_codecvt_facet<ucs4_t>);
...
std::wifstream input_file("data.utf8");
input_file.imbue(utf8_locale);
ucs4_t item = 0;
while (ifs >> item) { ... }
关于语言环境以及它们如何使用构面(包括 codecvt
),请查看以下内容:
To understand more about locales, and how they use facets (including codecvt
), take a look at the following:
- Nathan Myers有一个对区域设置和方面的详细说明。迈尔斯是地区概念的设计师之一。如果您想要通过它,他可以更正式的文档。
- Apache的标准库实施(以前称为RogueWave)具有完整列表
- Nicolai Josuttis的 C ++标准库第14章专门讨论主题。
- Angelika Langer和Klaus Kreft的标准C ++ IOStreams和区域设置用于整本书。
- Nathan Myers has a thorough explanation of locales and facets. Myers was one of the designers of the locale concept. He has more formal documentation if you want to wade through it.
- Apache's Standard Library implementation (formerly RogueWave's) has a full list of facets.
- Nicolai Josuttis' The C++ Standard Library Chapter 14 is devoted to the subject.
- Angelika Langer and Klaus Kreft's Standard C++ IOStreams and Locales devotes a whole book.
这篇关于does(w)ifstream支持不同的编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!