错误的Unicode转换,如何存储重音字符在Delphi 2010源代码和句柄字符集? [英] Wrong Unicode conversion, how to store accent characters in Delphi 2010 source code and handle character sets?

查看:122
本文介绍了错误的Unicode转换,如何存储重音字符在Delphi 2010源代码和句柄字符集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在将我们的项目从Delphi 2006升级到Delphi 2010.旧代码是:

  InputText:string; 
InputText:= SomeTEditComponent.Text;
...
for i:= 1 to length(InputText)do
如果InputText [i]在['0'...'9','a' ,'Ř'{和更多特殊字符}] then ...

麻烦的是重音字母 - 比较将失败。



我尝试将源代码从ANSI切换到UTF8和LE UCS-2,但没有运气。只有当AnsiChar工作时:

 如果CharInSet(AnsiChar(InputText [i]),['0' ,'a'..'z','Ř'])then 

使用这些字母 - 在调试期间在评估中尝试此操作:

  Ord('Ř')= Ord $ b  

(是的,Delphi说True,在Windows 7捷克语)


$ b $问题是:如何存储和比较简单字符串,而不强制它们作为AnsiStrings?因为如果这不工作为什么我们应该使用Unicode?

感谢所有回复



在一些部分使用简单的CharInSet(AnsiChar(...

解决方案

如Uwe Raabe所说,Unicode字符的问题是,如果Delphi允许你创建一个Char,它的大小是8Kb!AnsiChar的集合只有32字节的大小,很容易管理。



我想提供一些替代方法,首先是一个CharInSet函数的替换,一个使用CHAR数组来做测试,它的唯一的优点是它可以如果我可以避免这个问题:

  function UnicodeCharInSet(UniChr: char; CharArray:Char数组):Boolean; 
var i:Integer;
begin
for i:= 0 to High(CharArray)do
如果CharArray [i] UniChr then
begin
结果:= True;
退出;
end;
结果:= False;
end;

这个函数的麻烦是它不处理 x ['a'..'z'] 语法,它很慢!替代方案更快,但不像一个可能想要的替代品那么接近。要调查的第一组替代方法是字符串函数从Microsoft。其中有IsCharAlpha和IsCharAlphanumeric,他们可能修复很多问题。与那些,所有的alpha字符的问题是一样的:你可能会得到有效的阿尔法字符在非扩展非捷克语言。或者,您可以使用 Embarcadero的TCharacter类 - 实现全部在Character.pas单元,它看起来有效,我不知道Microsoft的实现是多么有效。



另一个替代方法是编写自己的函数,使用case语句来使事情工作。例如:

  function UnicodeCharIs(UniChr:Char):Boolean; 
var i:Integer;
begin
case UniChr of
'ă':Result:= True;
'ş':Result:= False;
'Ă':Result:= True;
'Ş':Result:= False;
else结果:= False;
end;
end;

我检查了为此函数生成的汇编器。虽然Delphi必须为此实现一系列if条件,但是它非常有效,比从代码实现一系列IF语句更好。但它可以使用很多改进。



对于使用ALOT的测试,您可能想要寻找一些基于位掩码的实现。


We are upgrading our project from Delphi 2006 to Delphi 2010. Old code was:

InputText: string;
InputText := SomeTEditComponent.Text;
...
for i := 1 to length(InputText) do
if InputText[i] in ['0'..'9', 'a'..'z', 'Ř' { and more special characters } ] then ...

Trouble is with accent letters - compare will fail.

I tried switch source code from ANSI to UTF8 and LE UCS-2 but without luck. Only cast as AnsiChar works:

if CharInSet(AnsiChar(InputText[i]), ['0'..'9', 'a'..'z', 'Ř']) then

Funny is how Delphi works with that letters - try this in Evaluate during debugging:

Ord('Ř') = Ord('Ø')

(yes, Delphi says True, on Windows 7 Czech)


Question is: How can I store and compare simple strings without forcing them as AnsiStrings? Because if this is not working why we should use Unicode?

Thanks all for reply

Right now we are using in some parts simple CharInSet(AnsiChar(...

解决方案

As mentioned by Uwe Raabe, the problem with Unicode char is, they're pretty large. If Delphi allowed you to create an "set of Char" it would be 8 Kb in size! An "set of AnsiChar" is only 32 bytes in size, pretty manageable.

I'd like to offer some alternatives. First is a sort of drop-in replacement for the CharInSet function, one that uses an array of CHAR to do the tests. It's only merit is that it can be called immediately from almost anywhere, but it's benefits stop there. I'd avoid this if I can:

function UnicodeCharInSet(UniChr:Char; CharArray:array of Char):Boolean;
var i:Integer;
begin
  for i:=0 to High(CharArray) do
    if CharArray[i] = UniChr then
    begin
      Result := True;
      Exit;
    end;
  Result := False;
end;

The trouble with this function is that it doesn't handle the x in ['a'..'z'] syntax and it's slow! The alternatives are faster, but aren't as close to a drop-in replacement as one might want. The first set of alternatives to be investigated are the string functions from Microsoft. Amongst them there's IsCharAlpha and IsCharAlphanumeric, they might fix lots of issues. The problem with those, all "alpha" chars are the same: You might end up with valid Alpha chars in non-enlgish non-czech languages. Alternatively you can use the TCharacter class from Embarcadero - the implementation is all in the Character.pas unit, and it looks effective, I have no idea how effective Microsoft's implementation is.

An other alternative is to write your own functions, using an "case" statement to get things to work. Here's an example:

function UnicodeCharIs(UniChr:Char):Boolean;
var i:Integer;
begin
  case UniChr of
    'ă': Result := True;
    'ş': Result := False;
    'Ă': Result := True;
    'Ş': Result := False;
    else Result := False;
  end;
end;

I inspected the assembler generated for this function. While Delphi has to implement a series of "if" conditions for this, it does it very effectively, way better then implementing the series of IF statements from code. But it could use a lot of improvement.

For tests that are used ALOT you might want to look for some bit-mask based implementation.

这篇关于错误的Unicode转换,如何存储重音字符在Delphi 2010源代码和句柄字符集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆