在Linux上将iconv与WCHAR_T一起使用 [英] Using iconv with WCHAR_T on Linux

查看:303
本文介绍了在Linux上将iconv与WCHAR_T一起使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Linux上具有以下代码:-

I have the following code on Linux:-

rc = iconv_open("WCHAR_T", SourceCode);

在使用iconv将数据转换为宽字符串之前( wchar_t )。

prior to using iconv to convert the data into a wide character string (wchar_t).

我试图了解它的功能,以便将其移植到平台上,其中参数1的选项 WCHAR_T 不存在。

I am trying to understand what it achieves in order to port it to a platform where the option on parameter 1, "WCHAR_T", does not exist.

这会导致子问题,例如:

This leads to sub-questions such as:


  • 在Linux上是否有 wchar_t 的单一表示形式?

  • 什么代码页这用吗?我想也许是UTF-32

  • 它是否依靠任何语言环境设置来实现?

  • Is there a single representation of wchar_t on Linux?
  • What codepage does this use? I imagine maybe UTF-32
  • Does it rely on any locale settings to achieve this?

我希望给出一个类似以下内容的答案:您显示的代码是执行以下两项操作的简写...。然后,我也许可以执行这两个步骤,而不是平台上的简写其中 iconv_open 上的 WCHAR_T 选项不存在。

I'm hoping for an answer that says something like: "The code you show is shorthand for doing the following 2 things instead...." and then I might be able to do those two steps instead of the shorthand on the platform where "WCHAR_T" option on iconv_open doesn't exist.

推荐答案

(非标准) WCHAR_T 编码存在的原因是为了易于将指针投射到 wchar_t 放入指向 char 的指针,并与 iconv 一起使用。该编码可以理解的格式与系统的本机 wchar_t 是什么一样。

The reason the (non-standard) WCHAR_T encoding exists is to make it easy to cast a pointer to wchar_t into a pointer to char and use it with iconv. The format understood by that encoding is whatever the system's native wchar_t is.

如果您要询问glibc和而不是其他libc实现,那么在Linux上, wchar_t 是系统本机字节序的32位类型,表示Unicode代码点。这与 UTF-32 不同,因为 UTF-32 通常具有字节顺序标记(BOM)如果没有,则为大端。 WCHAR_T 始终是本机尾数。

If you're asking about glibc and not other libc implementations, then on Linux wchar_t is a 32-bit type in the system's native endianness, and represents Unicode codepoints. This is not the same as UTF-32, since UTF-32 normally has a byte-order mark (BOM) and when it does not, is big endian. WCHAR_T is always native endian.

请注意,某些系统对 wchar_t <使用不同的语义/ code>。 Windows始终使用带有小尾数UTF-16的16位类型。如果在该平台上使用GNU libiconv,则 WCHAR_T 编码将与在Linux上运行时的编码不同。

Note that some systems use different semantics for wchar_t. Windows always uses a 16-bit type using a little-endian UTF-16. If you used the GNU libiconv on that platform, the WCHAR_T encoding would be different than if you ran it on Linux.

语言环境设置不会影响 wchar_t ,因为必须在编译时知道 wchar_t 的大小,因此实际上不能

Locale settings do not affect wchar_t because the size of wchar_t must be known at compile time, and therefore cannot practically vary based on locale.

如果这段代码确实在转换指向 wchar_t 的指针,并在其调用 iconv ,则需要调整代码以使用 UTF-16LE UTF-16BE UTF-32LE UTF-32BE sizeof(wchar_t)和平台的字节序上。这些编码不需要(也不允许)BOM,并且假设您未使用PDP-11,则其中一种对您的平台是正确的。

If this piece of code is indeed casting a pointer to wchar_t and using that in its call to iconv, then you need to adjust the code to use one of the encodings UTF-16LE, UTF-16BE, UTF-32LE, or UTF-32BE, depending on sizeof(wchar_t) and the platform's endianness. Those encodings do not require (nor allow) a BOM, and assuming you're not using a PDP-11, one of them will be correct for your platform.

如果您从其他来源获取数据,则需要弄清楚是什么,并使用上面列表中的适当编码。您还应该向上游发送补丁,并要求维护者使用其他更正确的编码来处理其数据格式。

If you're getting the data from some other source, then you need to figure out what that is, and use the appropriate encoding from the list above for it. You should also probably send a patch upstream and ask the maintainer to use a different, more correct encoding for handling their data format.

这篇关于在Linux上将iconv与WCHAR_T一起使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆