将Hi-Ansi字符转换为Ascii等效(é - > e) [英] Convert Hi-Ansi chars to Ascii equivalent (é -> e)

查看:197
本文介绍了将Hi-Ansi字符转换为Ascii等效(é - > e)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Delphi 2007中是否有可用的例程,可以根据语言环境(codepage)将ANSI表(> 127)的高范围内的字符转换为纯ASCII(< = 127)中的等效字符? p>

我知道一些字符不能很好地翻译,但大多数可以,esp。在192-255范围内:




  • À A

  • à a

  • Ë strong>

  • ë e

  • Ç C

  • ç c

  • - (en dash) - (连字号 - 可能会更棘手)

  • - (em dash) - (连字号)


解决方案

WideCharToMultiByte 对指定字符集不支持的任何字符(包括剥离变音符)进行最佳拟合映射。您可以通过使用它来完成所需的操作,并将20127(US-ASCII)作为代码页。

  function BestFit AInput:AnsiString):AnsiString; 
const
CodePage = 20127; // 20127 = us-ascii
var
WS:WideString;
begin
WS:= WideString(AInput);
SetLength(Result,WideCharToMultiByte(CodePage,0,PWideChar(WS),
Length(WS),nil,0,nil,nil));
WideCharToMultiByte(CodePage,0,PWideChar(WS),Length(WS),
PAnsiChar(Result),Length(Result),nil,nil);
结束

procedure TForm1.Button1Click(Sender:TObject);
begin
ShowMessage(BestFit('aÀàËëÇç - €¢'));
结束

通过您的示例调用它将产生您要查找的结果,包括emdash-to-minus ,我不认为是由Jeroen的建议转换为正常化形式D.如果你想采取这种方法,迈克尔·卡普兰有一个博客文章明确地讨论了剥离变音符号(而不是普通化),但它使用了C#和在Vista中引入的API。您可以使用FoldString api(任何WinNT版本)获得类似的东西。



当然,如果您只为一个字符集执行此操作,并且希望避免从转换到和从WideString开始,Padu是正确的,一个简单的循环和查找表将是一样有效。


Is there a routine available in Delphi 2007 to convert the characters in the high range of the ANSI table (>127) to their equivalent ones in pure ASCII (<=127) according to a locale (codepage)?

I know some chars cannot translate well but most can, esp. in the 192-255 range:

  • ÀA
  • àa
  • ËE
  • ëe
  • ÇC
  • çc
  • (en dash)- (hyphen - that can be trickier)
  • (em dash)- (hyphen)

解决方案

WideCharToMultiByte does best-fit mapping for any characters that aren't supported by the specified character set, including stripping diacritics. You can do exactly what you want by using that and passing 20127 (US-ASCII) as the codepage.

function BestFit(const AInput: AnsiString): AnsiString;
const
  CodePage = 20127; //20127 = us-ascii
var
  WS: WideString;
begin
  WS := WideString(AInput);
  SetLength(Result, WideCharToMultiByte(CodePage, 0, PWideChar(WS),
    Length(WS), nil, 0, nil, nil));
  WideCharToMultiByte(CodePage, 0, PWideChar(WS), Length(WS),
    PAnsiChar(Result), Length(Result), nil, nil);
end;

procedure TForm1.Button1Click(Sender: TObject);
begin
   ShowMessage(BestFit('aÀàËëÇç–—€¢Š'));
end;

Calling that with your examples produces results you're looking for, including the emdash-to-minus case, which I don't think is handled by Jeroen's suggestion to convert to Normalization form D. If you did want to take that approach, Michael Kaplan has a blog post the explicitly discusses stripping diacritics (rather than normalization in general), but it uses C# and an API that was introduces in Vista. You can get something similar using the FoldString api (any WinNT release).

Of course if you're only doing this for one character set, and you want to avoid the overhead from converting to and from a WideString, Padu is correct that a simple for loop and a lookup table would be just as effective.

这篇关于将Hi-Ansi字符转换为Ascii等效(é - &gt; e)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆