将高保真字符转换为Ascii等价字符(é - > e) [英] Convert Hi-Ansi chars to Ascii equivalent (é -> e)

查看:182
本文介绍了将高保真字符转换为Ascii等价字符(é - > e)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Delphi 2007中有一个例程可用于根据语言环境(codepage)将ANSI表(> 127)的高范围中的字符转换为纯ASCII(<= 127)中的等价字符? p>

我知道一些字符不能很好翻译,但大多数可以,尤其是。在192-255范围内:




  • À A

  • à

  • Ë strong>

  • ë e

  • C


  • ç - (连字符)(< > - (em dash)→ - >

    解决方案

    WideCharToMultiByte 对指定字符集不支持的任何字符(包括剥除变音符号)执行最佳拟合映射。

     函数BestFit(const())可以通过使用该函数来传递20127(US-ASCII) AInput:AnsiString):AnsiString; 
    const
    CodePage = 20127; // 20127 = us-ascii
    var
    WS:WideString;
    begin
    WS:= WideString(AInput);
    SetLength(Result,WideCharToMultiByte(CodePage,0,PWideChar(WS),
    Length(WS),nil,0,nil,nil)
    WideCharToMultiByte(CodePage,0,PWideChar(WS),Length(WS),
    PAnsiChar(Result),Length(Result),nil,nil);
    end;

    procedure TForm1.Button1Click(Sender:TObject);
    begin
    ShowMessage(BestFit('aÀàËëÇç--€¢Š'));
    end;

    用你的例子调用这个例子会产生你想要的结果,包括emdash-to-minus ,我不认为是由Jeroen的建议转换为正规化形式D.如果你确实想采取这种方法,迈克尔·卡普兰有一个博客帖子明确讨论剥离变音符号(而不是一般的标准化),但它使用C#和一个在Vista中引入的API。你可以使用FoldString api(任何WinNT版本)获得类似的东西。



    当然,如果你只为一个字符集做这个,你想避免从转换到和从一个WideString的开销,Padu是正确的,一个简单的for循环和查找表将一样有效。


    Is there a routine available in Delphi 2007 to convert the characters in the high range of the ANSI table (>127) to their equivalent ones in pure ASCII (<=127) according to a locale (codepage)?

    I know some chars cannot translate well but most can, esp. in the 192-255 range:

    • ÀA
    • àa
    • ËE
    • ëe
    • ÇC
    • çc
    • (en dash)- (hyphen - that can be trickier)
    • (em dash)- (hyphen)

    解决方案

    WideCharToMultiByte does best-fit mapping for any characters that aren't supported by the specified character set, including stripping diacritics. You can do exactly what you want by using that and passing 20127 (US-ASCII) as the codepage.

    function BestFit(const AInput: AnsiString): AnsiString;
    const
      CodePage = 20127; //20127 = us-ascii
    var
      WS: WideString;
    begin
      WS := WideString(AInput);
      SetLength(Result, WideCharToMultiByte(CodePage, 0, PWideChar(WS),
        Length(WS), nil, 0, nil, nil));
      WideCharToMultiByte(CodePage, 0, PWideChar(WS), Length(WS),
        PAnsiChar(Result), Length(Result), nil, nil);
    end;
    
    procedure TForm1.Button1Click(Sender: TObject);
    begin
       ShowMessage(BestFit('aÀàËëÇç–—€¢Š'));
    end;
    

    Calling that with your examples produces results you're looking for, including the emdash-to-minus case, which I don't think is handled by Jeroen's suggestion to convert to Normalization form D. If you did want to take that approach, Michael Kaplan has a blog post the explicitly discusses stripping diacritics (rather than normalization in general), but it uses C# and an API that was introduces in Vista. You can get something similar using the FoldString api (any WinNT release).

    Of course if you're only doing this for one character set, and you want to avoid the overhead from converting to and from a WideString, Padu is correct that a simple for loop and a lookup table would be just as effective.

    这篇关于将高保真字符转换为Ascii等价字符(é - &gt; e)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆