C ++是否支持在UTF-8,UTF-16和UTF-32以外的字符编码之间进行转换? [英] Does C++ support converting between character encodings other than UTF-8, UTF-16, and UTF-32?

查看:108
本文介绍了C ++是否支持在UTF-8,UTF-16和UTF-32以外的字符编码之间进行转换?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我了解C ++ 11中的 std :: codecvt< char16_t,char> 在UTF-16和UTF-8之间执行转换,而 std :: codecvt< char32_t,char> 执行UTF-32和UTF-8之间的转换。是否可以在UTF-8和ISO 8859-1之间进行转换?

I understand that std::codecvt<char16_t, char> in C++11 performs conversion between UTF-16 and UTF-8, and std::codecvt<char32_t, char> performs conversion between UTF-32 and UTF-8. Is it possible to convert between, say, UTF-8 and ISO 8859-1?

考虑:

const char* s = "\u00C0";

如果我打印此字符串并将终端的编码设置为UTF-8,我将看到该字符À。但是,如果我将终端的编码设置为ISO 8859-1,则打印该字符串将不会打印出所需的字符。我如何将 s 转换为一个字符串,当设置了终端的编码时,该字符串在打印时将显示字符À符合ISO 8859-1?

If I print this string and my terminal's encoding is set to UTF-8, I will see the character À. If I set my terminal's encoding to ISO 8859-1, however, printing that string will not print out the desired character. How would I convert s into a string that, when printed, will show the character À if my terminal's encoding is set to ISO 8859-1?

我了解可以使用iconv之类的库来完成此操作,但我很好奇是否可以仅使用C ++标准库来完成此操作。我问这个问题不是因为我不想使用iconv,而是因为我不太了解语言环境在C ++中的工作方式。

I understand that this can be done with a library such as iconv, but I am curious whether it can be done using only the C++ standard library. I ask this question not because I don't want to use iconv, but because I don't really understand how locales work in C++.

推荐答案

除了标准的强制编码之外,C ++还支持通过语言环境实现定义的编码列表:

In addition to the standard mandated encodings C++ also supports an implementation defined list of encodings via locales:

#include <locale>
#include <codecvt>
#include <iostream>

template <typename Facet>
struct usable_facet : Facet {
  using Facet::Facet;
};

using codecvt = usable_facet<std::codecvt_byname<wchar_t, char, std::mbstate_t>>;

int main() {
  std::wstring_convert<codecvt> convert(new codecvt(".1252")); // platform specific locale strings

  std::wstring w = convert.from_bytes("\u00C0");
}

不幸的是,关于 wchar_t 是标准授权,只是它对所有语言环境都使用固定宽度的编码,但并不需要在不同的语言环境中使用 same 编码,因此您可以不能使用一种语言环境可移植地转换为 wchar_t ,然后使用另一种语言环境将其转换回 char

Unfortunately one of the things about wchar_t is that the standard mandates only that it use a fixed width encoding for all locales, but there's no requirement that it use the same encoding in different locales, and so you can't portably convert to wchar_t using one locale and then convert that back to char using a different locale.

使用诸如 std :: mbrtoc32 和相关功能,但尚未广泛实施。

There is potentially some portable support for such conversions using functions like std::mbrtoc32 and related functions, but these are not yet widely implemented.


我知道这可以通过iconv这样的库来完成,但是我很好奇它是否只能使用C ++标准库完成。我问这个问题不是因为我不想使用iconv,而是因为我不太了解语言环境在C ++中的工作方式。

I understand that this can be done with a library such as iconv, but I am curious whether it can be done using only the C++ standard library. I ask this question not because I don't want to use iconv, but because I don't really understand how locales work in C++.

语言环境库的设计并没有真正适合现代用途。 C和C ++本身对编码和字符集感到困惑,语言环境将词汇和拼写问题与诸如编码之类的计算方面混为一谈。

The locale library's design doesn't really lend itself to modern usage. C and C++ are themselves confused about encodings vs. character sets, and locales conflate lexical and orthographic issues with computational aspects such as encoding.

语言环境如何工作是一个话题比适用于stackoverflow的答案更广泛,但有关于该主题的书。您可能还需要阅读特定于平台的资料,因为该标准并未真正为大多数功能提供任何背景信息。例如,语言环境库支持消息目录,但不告诉您它们是什么,或者您实际上制作一个,因为该功能尚未通过C ++进行标准化。

How locales work is a topic a bit broader than is suitable for a stackoverflow answer but there are books on the topic. You'd probably also need to read platform specific materials, because the standard doesn't really give any context for much of the functionality. For example the locale library supports message catalogues, but doesn't tell you what they are or how you'd actually make one because that's functionality is not standardized by C++.

这篇关于C ++是否支持在UTF-8,UTF-16和UTF-32以外的字符编码之间进行转换?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆