Delphi XE - RawByteString与AnsiString [英] Delphi XE - RawByteString vs AnsiString

查看:900
本文介绍了Delphi XE - RawByteString与AnsiString的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在这里有一个类似的问题: Delphi XE - 应该我使用String或AnsiString?在决定在我的一个(大)库中使用ANSI字符串是正确的,我意识到我实际上可以使用RawByteString而不是ANSI。因为我将UNICODE字符串与ANSI字符串混合在一起,我的代码现在有很少的地方在它们之间进行转换。但是,如果我使用RawByteString,我会摆脱这些转换。



请让我知道你对此的看法。

谢谢。 / p>




更新:

这似乎令人失望。看起来编译器仍然从RawByteString转换为字符串。

  procedure TForm1.FormCreate(Sender:TObject); 
var x1,x2:RawByteString;
s:string;
begin
x1:='a';
x2:='b';
x1:= x1 + x2;
s:= x1; {< -------从'RawByteString'到'string')的隐式字符串转换
end;

我认为它有一些内部的工作(如复制数据),我的代码不会更快并且我仍然需要在我的代码中添加许多类型转换,以使编译器静默。

解决方案

RawByteString 是一个 AnsiString ,默认情况下没有设置代码页。



当你为这个 RawByteString 变量分配另一个字符串,您将复制源代码页字符串。这将包括转换。对不起,



但是另一个使用 RawByteString ,这是用于存储纯字节内容(例如数据库BLOB字段内容,就像一个数组的字节



总结:




  • RawByteString 应用作方法或函数的代码页不可知参数;

  • RawByteString 可以用作变量类型来存储一些BLOB数据。



如果你想减少转换,而宁愿在应用程序中使用8位字符 string ,那么你应该更好:




  • 不要使用通用的 AnsiString 类型,这将取决于当前的系统代码页,数据松散;

  • 依靠UTF-8编码,即一些8位代码页/字符集,在转换为 UnicodeString ;

  • 不要让编译器对隐式转换显示警告:所有的转换都应该是明确的;

  • 使用您自己的专用功能来处理您的UTF-8内容。



这正是我们为我们的框架所做的。我们想在其内核中使用UTF-8,因为:




  • 我们依靠UTF-8编码的JSON进行数据传输;

  • 内存消耗将更小;

  • 使用的SQLite3 引擎将文本作为UTF-8存储在其数据库文件中;

  • 我们想要一种处理Unicode文本的方式,并且所有版本的Delphi都不会丢失数据(从Delphi 6到XE),而 WideString 不是一个选择,因为它很慢,你也有同样的隐含转换问题。



但为了达到最佳效果速度,我们编写一些优化的函数来处理我们的自定义字符串类型:

  {{RawUTF8是一个存储在AnsiString 
- 使用此类型而不是System.UTF8String,Delphi 2009编译器和以前版本之间的行为更改了
:我们的实现
与所有版本的Delphi编译器一致并兼容
- mim ic Delphi 2009 UTF8String,没有字符集转换开销
- 所有转换到/从AnsiString或RawUnicode必须是显式的}
{$ ifdef UNICODE} RawUTF8 =类型AnsiString(CP_UTF8); // UTF8string的代码页
{$ else} RawUTF8 = type AnsiString; {$ endif}

///我们的Fast RawFF8版本的Trim(),仅用于Unicode编译器
// - 这个Trim()很少被使用,但是这个RawUTF8特定的版本是需要的
//由Delphi 2009/2010 / XE,以避免两次不必要的转换到UnicodeString
函数Trim(const S:RawUTF8):RawUTF8;

///我们的快速RawUTF8版本的Pos(),仅用于Unicode编译器
// - 这个Pos()很少使用,但是这个RawUTF8的特定版本需要
//由Delphi 2009/2010 / XE,以避免两次不必要的转换成UnicodeString
函数Pos(const substr,str:RawUTF8):Integer;超载;一致;

我们保留了 RawByteString 类型进行处理BLOB数据:

  {$ ifndef UNICODE} 
/// define RawByteString,因为它存在于Delphi 2009 / 2010 / XE
// - 用于字节存储到AnsiString
// - 如果您不希望Delphi编译器不要执行任何
//代码页,请使用此类型将类型的AnsiString分配给RawByteString,
//即RawUTF8或WinAnsiString
RawByteString = AnsiString;
///指向RawByteString
的指针PRawByteString = ^ RawByteString;
{$ endif}

///从字符串内容创建一个文件
// - 使用RawByteString进行字节存储,无论代码页是
函数FileFromString( const内容:RawByteString; const FileName:TFileName;
FlushOnDisk:boolean = false):boolean;

源代码可用在我们的存储库。在这个单元中,UTF-8的相关功能得到了深入的优化,两种版本都以pascal和asm的速度提升。我们有时会重载默认函数(如 Pos )以避免转换,或者有关我们如何处理框架中的文本的更多信息是这里可用



最后一个字: / p>

如果您确定,您的应用程序中只会有7位内容(不加重字符),则可以使用默认的 AnsiString 键入您的程序。但是在这种情况下,您应该更好地在中使用子句添加 AnsiStrings 单元,以使重载的字符串函数将避免最不需要的转换。


I had a similar question to this here: Delphi XE - should I use String or AnsiString? . After deciding that it is right to use ANSI strings in a (large) library of mine, I have realized that I can actually use RawByteString instead of ANSI. Because I mix UNICODE strings with ANSI strings, my code now has quite few places where it does conversions between them. However, it looks like if I use RawByteString I get rid of those conversions.

Please let me know your opinion about it.
Thanks.


Update:
This seems to be disappointing. It looks like the compiler still makes a conversion from RawByteString to string.

procedure TForm1.FormCreate(Sender: TObject);
var x1, x2: RawByteString;
    s: string;
begin
  x1:= 'a';
  x2:= 'b';
  x1:= x1+ x2;
  s:= x1;              {      <------- Implicit string cast from 'RawByteString' to 'string'     }
end;

I think it does some internal workings (such as copying data) and my code will not be much faster and I will still have to add lots of typecasts in my code in order to silence the compiler.

解决方案

RawByteString is an AnsiString with no code page set by default.

When you assign another string to this RawByteString variable, you'll copy the code page of the source string. And this will include a conversion. Sorry.

But there is one another use of RawByteString, which is to store plain byte content (e.g. a database BLOB field content, just like an array of byte)

To summarize:

  • RawByteString should be used as a "code page agnostic" parameter to a method or function;
  • RawByteString can be used as a variable type to store some BLOB data.

If you want to reduce conversion, and would rather use 8 bit char string in your application, you should better:

  • Do not use the generic AnsiString type, which will depend on the current system code page, and by which you'll loose data;
  • Rely on UTF-8 encoding, i.e. some 8 bit code page / charset which won't loose any data when converted from or to an UnicodeString;
  • Don't let the compiler show warnings about implicit conversions: all conversion should be made explicit;
  • Use your own dedicated set of functions to handle your UTF-8 content.

That exactly what we made for our framework. We wanted to use UTF-8 in its kernel because:

  • We rely on UTF-8 encoded JSON for data transmission;
  • Memory consumption will be smaller;
  • The used SQLite3 engine will store text as UTF-8 in its database file;
  • We wanted a way of handling Unicode text with no loose of data with all versions of Delphi (from Delphi 6 up to XE), and WideString was not an option because it's dead slow and you've got the same problem of implicit conversions.

But, in order to achieve best speed, we write some optimized functions to handle our custom string type:

  {{ RawUTF8 is an UTF-8 String stored in an AnsiString
    - use this type instead of System.UTF8String, which behavior changed
     between Delphi 2009 compiler and previous versions: our implementation
     is consistent and compatible with all versions of Delphi compiler
    - mimic Delphi 2009 UTF8String, without the charset conversion overhead
    - all conversion to/from AnsiString or RawUnicode must be explicit }
{$ifdef UNICODE} RawUTF8 = type AnsiString(CP_UTF8); // Codepage for an UTF8string
{$else}          RawUTF8 = type AnsiString; {$endif}

/// our fast RawUTF8 version of Trim(), for Unicode only compiler
// - this Trim() is seldom used, but this RawUTF8 specific version is needed
// by Delphi 2009/2010/XE, to avoid two unnecessary conversions into UnicodeString
function Trim(const S: RawUTF8): RawUTF8;

/// our fast RawUTF8 version of Pos(), for Unicode only compiler
// - this Pos() is seldom used, but this RawUTF8 specific version is needed
// by Delphi 2009/2010/XE, to avoid two unnecessary conversions into UnicodeString
function Pos(const substr, str: RawUTF8): Integer; overload; inline;

And we reserved the RawByteString type for handling BLOB data:

{$ifndef UNICODE}
  /// define RawByteString, as it does exist in Delphi 2009/2010/XE
  // - to be used for byte storage into an AnsiString
  // - use this type if you don't want the Delphi compiler not to do any
  // code page conversions when you assign a typed AnsiString to a RawByteString,
  // i.e. a RawUTF8 or a WinAnsiString
  RawByteString = AnsiString;
  /// pointer to a RawByteString
  PRawByteString = ^RawByteString;
{$endif}

/// create a File from a string content
// - uses RawByteString for byte storage, thatever the codepage is
function FileFromString(const Content: RawByteString; const FileName: TFileName;
  FlushOnDisk: boolean=false): boolean;

Source code is available in our repository. In this unit, UTF-8 related functions were deeply optimized, with both version in pascal and asm for better speed. We sometimes overloaded default functions (like Pos) to avoid conversion, or More information about how we handled text in the framework is available here.

Last word:

If you are sure that you will only have 7 bit content in your application (no accentuated characters), you may use the default AnsiString type in your program. But in this case, you should better add the AnsiStrings unit in your uses clause to have overloaded string functions which will avoid most unwanted conversion.

这篇关于Delphi XE - RawByteString与AnsiString的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆