使用ICU实现我自己的codecvt facet [英] Using ICU to implement my own codecvt facet

查看：255 发布时间：2016/11/1 23:22:18 c++ iostream icu codecvt

本文介绍了使用ICU实现我自己的codecvt facet的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想使用ICU从内部转换任何字符编码（ICU支持）到UTF-8，实现一个 codecvt 我知道 codecvt_byname 存在，它可以用来做我想要的一部分，如此示例。该示例的问题是它（1）使用宽字符流（我想使用常规，面向字节的流），（2）需要2个流来执行转换。相反，我想要一个单一的流如：

I want to implement a codecvt facet using ICU to convert from any character encoding (that ICU supports) to UTF-8 internally. I'm aware that codecvt_byname exists and that it can be used to do part of what I want as shown in this example. The problems with that example are that it (1) uses wide character streams (I want to use "regular", byte-oriented streams) and (2) requires 2 streams to perform the conversion. Instead, I want a single stream like:

locale loc( locale(), new icu_codecvt( "ISO-8859-1" ) );
ifstream ifs;
ifs.imbue( loc );
ifs.open( "/path/to/some/file.txt" );
// data read from ifs here will have been converted from ISO-8859-1 to UTF-8

因此，我要做一个实现，例如 this ，但使用ICU，而不是 iconv 。
鉴于此，我的 do_in（）的实现是：

Hence, I wand to do an implementation like this but using ICU rather than iconv. Given that, my implementation of do_in() is:

icu_codecvt::result icu_codecvt::do_in( state_type &state,
                                        extern_type const *from, extern_type const *from_end,
                                        extern_type const *&from_next, intern_type *to,
                                        intern_type *to_end, intern_type *&to_next ) const {
  from_next = from;
  to_next = to;
  if ( always_noconv_ )
    return noconv;

  our_state *const s = state_store_.get( state );
  UErrorCode err = U_ZERO_ERROR;
  ucnv_convertEx(
    s->utf8_conv_, s->extern_conv_, &to_next, to_end, &from_next, from_end,
    nullptr, nullptr, nullptr, nullptr, false, false, &err
  );
  if ( err == U_TRUNCATED_CHAR_FOUND )
    return partial;
  return U_SUCCESS( err ) ? ok : error;
}

our_state 维护两个 UConverter * 指针，一个用于外部编码（在本示例中为ISO-8859-1），一个用于UTF-8编码。

The our_state object maintains two UConverter* pointers, one for the "external" encoding (in this example, ISO-8859-1) and one for the UTF-8 encoding.

我的问题是：

我应该指定 nullptr 为上述的枢轴缓冲区，或提供我自己的？

我不知道什么时候，我应该设置参数（目前为上述第一个 false ）至 true $ b
不清楚我如何知道什么时候设置 flush 参数（当前第二个 false ）到 true ，即我如何知道何时到达输入结束。

Should I specify nullptr for the "pivot" buffer as above, or supply my own?
I'm not sure when, if ever, I should set the reset argument (currently the first false above) to true.
It's not clear how I would know when to set the flush argument (currently the second false above) to true, i.e., how I know when the end of the input has been reached.

有点帮助？

使用ICU实现我自己的codecvt facet [英] Using ICU to implement my own codecvt facet

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

使用ICU实现我自己的codecvt facet [英] Using ICU to implement my own codecvt facet

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭