将 Hi-Ansi 字符转换为等效的 Ascii (é -> e) [英] Convert Hi-Ansi chars to Ascii equivalent (é -> e)

查看:31
本文介绍了将 Hi-Ansi 字符转换为等效的 Ascii (é -> e)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Delphi 2007 中是否有一个例程可以根据语言环境(代码页)将 ANSI 表 (>127) 高范围内的字符转换为纯 ASCII (<=127) 中的等效字符?

Is there a routine available in Delphi 2007 to convert the characters in the high range of the ANSI table (>127) to their equivalent ones in pure ASCII (<=127) according to a locale (codepage)?

我知道有些字符不能很好地翻译,但大多数可以,尤其是.在 192-255 范围内:

I know some chars cannot translate well but most can, esp. in the 192-255 range:

  • ÀA
  • àa
  • ËE
  • ëe
  • ÇC
  • çc
  • (破折号)- (连字符 - 这可能更棘手)
  • (破折号)- (连字符)
  • ÀA
  • àa
  • ËE
  • ëe
  • ÇC
  • çc
  • (en dash)- (hyphen - that can be trickier)
  • (em dash)- (hyphen)

推荐答案

WideCharToMultiByte 对指定字符集不支持的任何字符进行最佳映射,包括剥离变音符号.你可以通过使用它并传递 20127 (US-ASCII) 作为代码页来做你想做的事情.

WideCharToMultiByte does best-fit mapping for any characters that aren't supported by the specified character set, including stripping diacritics. You can do exactly what you want by using that and passing 20127 (US-ASCII) as the codepage.

function BestFit(const AInput: AnsiString): AnsiString;
const
  CodePage = 20127; //20127 = us-ascii
var
  WS: WideString;
begin
  WS := WideString(AInput);
  SetLength(Result, WideCharToMultiByte(CodePage, 0, PWideChar(WS),
    Length(WS), nil, 0, nil, nil));
  WideCharToMultiByte(CodePage, 0, PWideChar(WS), Length(WS),
    PAnsiChar(Result), Length(Result), nil, nil);
end;

procedure TForm1.Button1Click(Sender: TObject);
begin
   ShowMessage(BestFit('aÀàËëÇç–—€¢Š'));
end;

用你的例子调用它会产生你正在寻找的结果,包括 emdash-to-minus 的情况,我认为 Jeroen 的建议不会处理它转换为规范化形式 D.如果你确实想接受那个方法,Michael Kaplan 有一篇博客文章明确讨论了去除变音符号(而不是比一般的规范化),但它使用 C# 和 Vista 中引入的 API.您可以使用 FoldString api(任何 WinNT 版本)获得类似的东西.

Calling that with your examples produces results you're looking for, including the emdash-to-minus case, which I don't think is handled by Jeroen's suggestion to convert to Normalization form D. If you did want to take that approach, Michael Kaplan has a blog post the explicitly discusses stripping diacritics (rather than normalization in general), but it uses C# and an API that was introduces in Vista. You can get something similar using the FoldString api (any WinNT release).

当然,如果您只为一个字符集执行此操作,并且您想避免与 WideString 之间转换的开销,Padu 是正确的,简单的 for 循环和查找表同样有效.

Of course if you're only doing this for one character set, and you want to avoid the overhead from converting to and from a WideString, Padu is correct that a simple for loop and a lookup table would be just as effective.

这篇关于将 Hi-Ansi 字符转换为等效的 Ascii (é -> e)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆