Xerces-c和跨平台字符串文字 [英] Xerces-c and cross-platform string literals

查看:124
本文介绍了Xerces-c和跨平台字符串文字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将使用Xerces-c进行XML处理的代码库从Windows/VC ++移植到Linux/G ++.

I'm porting a code-base that uses Xerces-c for XML processing from Windows/VC++ to Linux/G++.

在Windows上,Xerces-c使用wchar_t作为字符类型XmlCh.这使人们可以使用std::wstringL""语法的字符串文字.

On Windows, Xerces-c uses wchar_t as the character type XmlCh. This has allowed people to use std::wstring and string literals of L"" syntax.

在Linux/G ++上,wchar_t是32位,而Xerces-c使用unsigned short int(16位)作为字符类型XmlCh.

On Linux/G++, wchar_t is 32-bit and Xerces-c uses unsigned short int (16-bit) as the character type XmlCh.

我从这首歌开始:

#ifdef _MSC_VER
using u16char_t = wchar_t;
using u16string_t = std::wstring;
#elif defined __linux
using u16char_t = char16_t;
using u16string_t = std::u16string;
#endif

不幸的是,char16_tunsigned short int不是等效的,并且它们的指针不是隐式可转换的.因此,将u"Hello, world."传递给Xerces函数仍然会导致无效的转换错误.

Unfortunately, char16_t and unsigned short int are not equivalent and their pointers are not implicitly convertible. So passing u"Hello, world." to Xerces functions still results in invalid conversion errors.

开始看起来我必须显式地转换传递给Xerces函数的每个字符串.但是在我这样做之前,我想问问是否有人知道一种更聪明的方式来编写跨平台Xerces-c代码.

It's starting to look like I'm going to have to explicitly cast every string I pass to Xerces functions. But before I do, I wanted to ask if anyone knows a saner way to programme cross-platform Xerces-c code.

推荐答案

答案是不,没有人对如何执行此操作有个好主意.对于发现此问题的其他人,这是我想出的:

The answer is that no, no-one has a good idea on how to do this. For anyone else who finds this question, this is what I came up with:

#ifdef _MSC_VER
#define U16S(x) L##x
#define U16XS(x) L##x

#define XS(x) x
#define US(x) x

#elif defined __linux

#define U16S(x) u##x
#define U16XS(x) reinterpret_cast<const unsigned short *>(u##x)

inline unsigned short *XS(char16_t* x) {
    return reinterpret_cast<unsigned short *>(x);
}
inline const unsigned short *XS(const char16_t* x) {
    return reinterpret_cast<const unsigned short *>(x);
}
inline char16_t* US(unsigned short *x) {
    return reinterpret_cast<char16_t *>(x);
}
inline const char16_t* US(const unsigned short *x) {
    return reinterpret_cast<const char16_t*>(x);
}

#include "char16_t_facets.hpp"
#endif

namespace SafeStrings {
#if defined _MSC_VER

    using u16char_t = wchar_t;
    using u16string_t = std::wstring;
    using u16sstream_t = std::wstringstream;
    using u16ostream_t = std::wostream;
    using u16istream_t = std::wistream;
    using u16ofstream_t = std::wofstream;
    using u16ifstream_t = std::wifstream;
    using filename_t = std::wstring;

#elif defined __linux

    using u16char_t = char16_t;
    using u16string_t = std::basic_string<char16_t>;
    using u16sstream_t = std::basic_stringstream<char16_t>;
    using u16ostream_t = std::basic_ostream<char16_t>;
    using u16istream_t = std::basic_istream<char16_t>;
    using u16ofstream_t = std::basic_ofstream<char16_t>;
    using u16ifstream_t = std::basic_ifstream<char16_t>;
    using filename_t = std::string;

#endif

char16_t_facets.hpp具有模板专业化std::ctype<char16_t>std::numpunct<char16_t>std::codecvt<char16_t, char, std::mbstate_t>的定义.有必要将它们与std::num_get<char16_t>std::num_put<char16_t>一起添加到全局语言环境中(但没有必要为它们提供专门化). codecvt的代码是唯一困难的地方,并且可以在GCC 5.0库中找到一个合理的模板(如果使用GCC 5,则不需要提供codecvt专门化,因为它已经在库中了).

char16_t_facets.hpp has definitions of the template specialisations std::ctype<char16_t>, std::numpunct<char16_t>, std::codecvt<char16_t, char, std::mbstate_t>. It's necessary to add these to the global locale, along with std::num_get<char16_t> and std::num_put<char16_t> (but it's not necessary to provide specialisations for these). The code for codecvt is the only bit that's difficult, and a reasonable template can be found in the GCC 5.0 libraries (if you use GCC 5, you don't need to provide the codecvt specialisation as it's already in the library).

完成所有这些操作后,char16_t流将正常工作.

Once you've done all of that, the char16_t streams will work correctly.

然后,每次定义宽字符串而不是L"string"时,都写U16S("string").每次将字符串传递给Xerces时,请为文字编写XS(string.c_str())或U16XS("string").每次从Xerces返回字符串时,都将其转换为u16string_t(US(call_xerces_function())).

Then, every time you define a wide string, instead of L"string", write U16S("string"). Every time you pass a string to Xerces, write XS(string.c_str()) or U16XS("string") for literals. Every time you get a string back from Xerces, convert it back as u16string_t(US(call_xerces_function())).

请注意,也可以将字符类型设置为char16_t来重新编译Xerces-C.这消除了上面所需的很多工作. 但是,您将无法使用系统上依赖Xerces-C的其他任何库.链接到任何此类库都会导致链接错误(因为更改字符类型会更改许多Xerces函数签名).

Note that it is also possible to recompile Xerces-C with the character type set to char16_t. This removes a lot of the effort required above. BUT you won't be able to use any other library on the system that in turn depends on Xerces-C. Linking to any such library will cause link errors (because changing the character type changes many of the Xerces function signatures).

这篇关于Xerces-c和跨平台字符串文字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆