使用iostreams和ICU实时转码字符 [英] Transcoding characters on-the-fly using iostreams and ICU

查看:247
本文介绍了使用iostreams和ICU实时转码字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想要实时转码字符编码。我想使用iostreams和我自己的转码 streambuf ,例如:

I'd like to transcode character encoding on-the-fly. I'd like to use iostreams and my own transcoding streambuf, e.g.:

xcoder_streambuf xbuf( "UTF-8", "ISO-8859-1", cout.rdbuf() );
cout.rdbuf( &xbuf );

char *utf8_s;    // pointer to buffer containing UTF-8 encoded characters
// ...
cout << utf8_s;  // characters are written in ISO-8859-1

xcoder_streambuf 将使用ICU的转换器API。它将接收数据(在这种情况下,从 utf8_s ),转码它,并使用iostream的原始 steambuf

The implementation of xcoder_streambuf would use ICU's converters API. It would take the data coming in (in this case, from utf8_s), transcode it, and write it out using the iostream's original steambuf.

这是合理的方式吗?

推荐答案


这是一个合理的方法吗?

Is that a reasonable way to go?

是的,但这不是你在现代(如1997年)iostream的预期。

Yes, but it is not the way you are expected to do it in modern (as in 1997) iostream.

通过 overflow(int_type c)定义通过 basic_streambuf< code> virtual function。

The behaviour of outputting through basic_streambuf<> is defined by the overflow(int_type c) virtual function.

basic_filebuf<> :: overflow(int_type c = traits :: eof 包括 a_codecvt.out(state,b,p,end,xbuf,xbuf + XSIZE,xbuf_end); 其中 a_codecvt 定义为:

const codecvt<charT,char,typename traits::state_type>& a_codecvt 
     = use_facet<codecvt<charT,char,typename traits::state_type> >(getloc());

所以你需要 imbue code>与 codecvt< charT,char,typename traits :: state_type> 转换器。

so you are expected to imbue a locale with the appropriate codecvt<charT,char,typename traits::state_type> converter.


codecvt< internT,externT,stateT> 用于从一个字符编码转换为另一个时,例如从宽字符到多字节字符或宽字符编码(如Unicode和EUC)之间。

The class codecvt<internT,externT,stateT> is for use when converting from one character encoding to another, such as from wide characters to multibyte characters or between wide character encodings such as Unicode and EUC.

Unicode的标准库支持自1997年以来的进步:

The standard library support for Unicode made some progress since 1997:


专业化codecvt在UTF-32和UTF-8编码方案之间转换。

the specialization codecvt converts between the UTF-32 and UTF-8 encoding schemes.

这似乎是你想要的(ISO-8859-1代码是USC-4代码= UTF-32)。

This seems what you want (ISO-8859-1 codes are USC-4 codes = UTF-32).


如果没有,会更好吗?

If not, what would be better?

我会为UTF8引入一个不同的类型,例如:

I would introduce a different type for UTF8, like:


struct utf8 {
    unsigned char d; // d for data
};

struct latin1 {
    unsigned char c; // c for character 
};

这样,你不能意外地传递UTF8,而ISO-8859- *是预期的。但是你必须写一些接口代码,你的流的类型不会 istream / ostream

This way you cannot accidentally pass UTF8 where ISO-8859-* is expected. But then you would have to write some interface code, and the type of your streams won't be istream/ostream.

免责声明:我从来没有做过这样的事情,所以我不知道在实践中是否可行。

Disclaimer: I never actually did such a thing, so I don't know if it is workable in practice.

这篇关于使用iostreams和ICU实时转码字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆