使用iostreams和ICU实时转码字符 [英] Transcoding characters on-the-fly using iostreams and ICU

查看：247 发布时间：2016/11/1 23:22:41 c++ unicode character-encoding iostream icu

本文介绍了使用iostreams和ICU实时转码字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想要实时转码字符编码。我想使用iostreams和我自己的转码 streambuf ，例如：

I'd like to transcode character encoding on-the-fly. I'd like to use iostreams and my own transcoding streambuf, e.g.:

xcoder_streambuf xbuf( "UTF-8", "ISO-8859-1", cout.rdbuf() );
cout.rdbuf( &xbuf );

char *utf8_s;    // pointer to buffer containing UTF-8 encoded characters
// ...
cout << utf8_s;  // characters are written in ISO-8859-1

xcoder_streambuf 将使用ICU的转换器API。它将接收数据（在这种情况下，从 utf8_s ），转码它，并使用iostream的原始 steambuf 。

The implementation of xcoder_streambuf would use ICU's converters API. It would take the data coming in (in this case, from utf8_s), transcode it, and write it out using the iostream's original steambuf.

这是合理的方式吗？

推荐答案

这是一个合理的方法吗？

Is that a reasonable way to go?

是的，但这不是你在现代（如1997年）iostream的预期。

Yes, but it is not the way you are expected to do it in modern (as in 1997) iostream.

通过 overflow（int_type c）定义通过 basic_streambuf< code> virtual function。


The behaviour of outputting through basic_streambuf<> is defined by the overflow(int_type c) virtual function.
  basic_filebuf<> :: overflow（int_type c = traits :: eof 包括 a_codecvt.out（state，b，p，end，xbuf，xbuf + XSIZE，xbuf_end）; 其中 a_codecvt 定义为：
const codecvt<charT,char,typename traits::state_type>& a_codecvt 
     = use_facet<codecvt<charT,char,typename traits::state_type> >(getloc());

所以你需要 imbue  code>与 codecvt< charT，char，typename traits :: state_type> 转换器。
so you are expected to imbue a locale with the appropriate codecvt<charT,char,typename traits::state_type> converter.
 
 类 codecvt< internT，externT，stateT> 用于从一个字符编码转换为另一个时，例如从宽字符到多字节字符或宽字符编码（如Unicode和EUC）之间。

  The class codecvt<internT,externT,stateT> is for use when converting from one character encoding to another, such as from wide characters to multibyte characters or between wide character encodings such as Unicode and EUC.
 Unicode的标准库支持自1997年以来的进步：
The standard library support for Unicode made some progress since 1997:
 
 专业化codecvt在UTF-32和UTF-8编码方案之间转换。

  the specialization codecvt  converts between the UTF-32 and UTF-8 encoding schemes.
这似乎是你想要的（ISO-8859-1代码是USC-4代码= UTF-32）。
This seems what you want (ISO-8859-1 codes are USC-4 codes = UTF-32).
 
 如果没有，会更好吗？

  If not, what would be better?
我会为UTF8引入一个不同的类型，例如：
I would introduce a different type for UTF8, like:

struct utf8 {
    unsigned char d; // d for data
};

struct latin1 {
    unsigned char c; // c for character 
};

这样，你不能意外地传递UTF8，而ISO-8859- *是预期的。但是你必须写一些接口代码，你的流的类型不会 istream  /  ostream  。
This way you cannot accidentally pass UTF8 where ISO-8859-* is expected. But then you would have to write some interface code, and the type of your streams won't be istream/ostream.
免责声明：我从来没有做过这样的事情，所以我不知道在实践中是否可行。
Disclaimer: I never actually did such a thing, so I don't know if it is workable in practice.

                        这篇关于使用iostreams和ICU实时转码字符的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

使用iostreams和ICU实时转码字符 [英] Transcoding characters on-the-fly using iostreams and ICU

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

使用iostreams和ICU实时转码字符 [英] Transcoding characters on-the-fly using iostreams and ICU

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭