如何使用特定的CodePage将AnsiChar转换为UnicodeChar? [英] How to convert AnsiChar to UnicodeChar with specific CodePage?

查看:132
本文介绍了如何使用特定的CodePage将AnsiChar转换为UnicodeChar?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在生成纹理地图集,以在我的应用程序中呈现Unicode文本。源文本存储在ANSI代码页中(1250、1251、1254、1257等)。我希望能够从每个ANSI代码页生成所有符号。

I'm generating texture atlases for rendering Unicode texts in my app. Source texts are stored in ANSI codepages (1250, 1251, 1254, 1257, etc). I want to be able to generate all the symbols from each ANSI codepage.

这是我希望拥有的代码的轮廓:

Here is the outline of the code I would expect to have:

for I := 0 to 255 do
begin
  anChar := AnsiChar(I); //obtain AnsiChar

  //Apply codepage without converting the chars
  //<<--- this part does not work, showing:
  //"E2033 Types of actual and formal var parameters must be identical"
  SetCodePage(anChar, aCodepages[K], False);

  //Assign AnsiChar to UnicodeChar (automatic conversion)
  uniChar := anChar;

  //Here we get Unicode character index
  uniCode := Ord(uniChar);
end;

上面的代码不起作用(E2033),我不确定这是否是正确的解决方案所有。

The code above does not works (E2033) and I'm not sure it is a proper solution at all. Perhaps there's much shorter version.

在考虑特定代码页的情况下将AnsiChar转换为Unicode的正确方法是什么?

What is the proper way of converting AnsiChar into Unicode with specific codepage in mind?

推荐答案

我会这样做:

function AnsiCharToWideChar(ac: AnsiChar; CodePage: UINT): WideChar;
begin
  if MultiByteToWideChar(CodePage, 0, @ac, 1, @Result, 1) <> 1 then
    RaiseLastOSError;
end;

我认为您应该避免使用字符串来进行字符操作。如果您预先知道需要支持哪些代码页,则可以将转换硬编码为以数组常量表示的查找表。

I think you should avoid using strings for what is in essence a character operation. If you know up front which code pages you need to support then you can hard code the conversions into a lookup table expressed as an array constant.

请注意,所有在ANSI代码页中定义的映射从基本多语言平面映射到Unicode字符,因此由单个UTF-16字符表示。因此,上面代码的大小假设。

Note that all the characters that are defined in the ANSI code pages map to Unicode characters from the Basic Multilingual Plane and so are represented by a single UTF-16 character. Hence the size assumptions of the code above.

但是,您正在做的并且这个答案仍然存在的假设是,单个字节代表ANSI中的字符。字符集。这是许多字符集的正确假设,例如单字节西方字符集,例如1252。但是有些字符集(例如932(日语),949(科伦)等)是双字节字符集。对于这些代码页,您的整个方法都无法实现。我的猜测是只希望支持单字节字符集。

However, the assumption that you are making, and that this answer persists, is that a single byte represents a character in an ANSI character set. That's a valid assumption for many character sets, for example the single byte western character sets like 1252. But there are character sets like 932 (Japanese), 949 (Koren) etc. that are double byte character sets. Your entire approach breaks down for those code pages. My guess is that only wish to support single byte character sets.

如果您正在编写跨平台代码,则可以替换 MultiByteToWideChar UnicodeFromLocaleChars

If you are writing cross-platform code then you can replace MultiByteToWideChar with UnicodeFromLocaleChars.

这篇关于如何使用特定的CodePage将AnsiChar转换为UnicodeChar?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆