我的编解码器(也称为构面)从未被调用. [英] My codecvt (a.k.a. facet) never gets called.
问题描述
问题是无法调用utf16_codecvt
方法,因此结果是错误的.我已经在网上搜索了,但是我所能找到的只是应该起作用的示例.不幸的是,他们都没有成功.我在网上也看到过其他海报,也有同样的问题,但是没有人给他们回答.
我已经测试过,以确保它具有刻面(utf16_codecvt
)及其它.做.因此,我认为没有理由永远不会调用其虚拟方法.相反,它会继续调用codecvt<wchar_t,char, mbstate>
方法.
有什么想法吗?
The problem is that utf16_codecvt
methods never get called and, therefore, the result is wrong. I have search the net, but all I can find is examples of what is supposed to work. Unfortunately none of them has worked. I have also seen other posters, on the net, with the same problem, but no one gave them and answer to it.
I have tested to make sure that it has the facet (utf16_codecvt
) and it does. So I see no reason why its virtual methods are never called. Instead it keeps calling the codecvt<wchar_t,char, mbstate>
methods.
Any ideas?
class utf16_codecvt : public std::codecvt<char16_t, char16_t, std::mbstate_t>
{
...//
};
void MyTestFunc()
{
... //
std::wifstream myFile;
std::locale myLoc = std::locale(myFile.getloc(), new utf16_codecvt);
myFile.imbue(myLoc);
myFile.open(pFileName, std::ios::in | std::ios::binary);
... //
myFile.read(bom_buffer, 1);
... //
}
以下链接提供了我要尝试执行的操作类型的示例:
1999年4月1日-Unicode文件-PJ Plauger ^ ]
推荐答案
据我所知,C ++流系统假定该文件是字节序列,而不是字符序列-即使在使用宽流时-宽流(AFAICT)的宽"部分表示流对象如何与C ++交互,而不是与基础文件或其他对象交互.因此,您的编解码器方面必须包含字符.
通过将编解码器方面的声明更改为以下所示,我可以在设置的替换方面中获得断点.
From what I can tell, the C++ stream system presumes that files are sequences of bytes, not characters - even when you use wide streams - the ''wide'' part of wide stream (AFAICT) indicates how the stream object interacts with C++, not the underlying file or whatever. Thus, your codecvt facet has to take in characters.
By changing the declaration of your codecvt facet to that shown below, I was able to get breakpoints in the replacement facet being set.
class utf16_codecvt : public std::codecvt<char16_t, char, std::mbstate_t>
{
typedef std::codecvt<char16_t, char, std::mbstate_t> Base;
typedef char16_t ElemT;
typedef char ByteT;
virtual result __CLR_OR_THIS_CALL do_in(std::mbstate_t& s,
const ByteT *_First1, const ByteT *_Last1, const ByteT *& _Mid1,
ElemT*_First2, ElemT* _Last2, ElemT *& _Mid2) const
{ // convert bytes [_First1, _Last1) to [_First2, _Last)
return Base::do_in(s, _First1, _Last1, _Mid1, _First2, _Last2, _Mid2);
}
virtual result __CLR_OR_THIS_CALL do_out(std::mbstate_t& s,
const ElemT*_First1, const ElemT*_Last1, const ElemT*& _Mid1,
ByteT*_First2, ByteT*_Last2, ByteT*& _Mid2) const
{ // convert [_First1, _Last1) to bytes [_First2, _Last)
return Base::do_out(s, _First1, _Last1, _Mid1, _First2, _Last2, _Mid2);
}
virtual result __CLR_OR_THIS_CALL do_unshift(std::mbstate_t& s,
ByteT*_First2, ByteT*_Last2, ByteT*&_Mid2) const
{ // generate bytes to return to default shift state
return Base::do_unshift(s, _First2, _Last2, _Mid2);
}
virtual int __CLR_OR_THIS_CALL do_length(const std::mbstate_t& s, const ByteT*_First1,
const ByteT*_Last1, size_t _Count) const
{ // return min(_Count, converted length of bytes [_First1, _Last1))
return Base::do_length(s, _First1, _Last1, _Count);
}
};
因此,替换方面必须知道每个字符需要读取两个字节(显然,反之亦然).此类信息的最佳参考可能是 Standard C ++ IOStreams和语言环境Angelika Langer和Klaus Kreft [ ^ ]-但是即使如此,在C ++中,语言环境和构面也很繁琐:(
So, your replacement facet will have to know it needs two bytes read for every character (and vice versa, obviously). The best reference for that sort of information is probably Standard C++ IOStreams and Locales by Angelika Langer and Klaus Kreft[^] - but even then, locales and facets are heavy going in C++ :(
这篇关于我的编解码器(也称为构面)从未被调用.的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!