Delphi XE - RawByteString与AnsiString [英] Delphi XE - RawByteString vs AnsiString
问题描述
请让我知道你对此的看法。
谢谢。 / p>
更新:
这似乎令人失望。看起来编译器仍然从RawByteString转换为字符串。
procedure TForm1.FormCreate(Sender:TObject);
var x1,x2:RawByteString;
s:string;
begin
x1:='a';
x2:='b';
x1:= x1 + x2;
s:= x1; {< -------从'RawByteString'到'string')的隐式字符串转换
end;
我认为它有一些内部的工作(如复制数据),我的代码不会更快并且我仍然需要在我的代码中添加许多类型转换,以使编译器静默。
RawByteString
是一个 AnsiString
,默认情况下没有设置代码页。
当你为这个 RawByteString
变量分配另一个字符串
,您将复制源代码页字符串
。这将包括转换。对不起,
但是另一个使用 RawByteString
,这是用于存储纯字节内容(例如数据库BLOB字段内容,就像一个数组的字节
)
总结:
-
RawByteString
应用作方法或函数的代码页不可知参数; -
RawByteString
可以用作变量类型来存储一些BLOB数据。
如果你想减少转换,而宁愿在应用程序中使用8位字符 string
,那么你应该更好:
- 不要使用通用的
AnsiString
类型,这将取决于当前的系统代码页,数据松散; - 依靠UTF-8编码,即一些8位代码页/字符集,在转换为
UnicodeString $ c时不会丢失任何数据$ c>;
- 不要让编译器对隐式转换显示警告:所有的转换都应该是明确的;
- 使用您自己的专用功能来处理您的UTF-8内容。
这正是我们为我们的框架所做的。我们想在其内核中使用UTF-8,因为:
- 我们依靠UTF-8编码的JSON进行数据传输;
- 内存消耗将更小;
- 使用的SQLite3 引擎将文本作为UTF-8存储在其数据库文件中;
- 我们想要一种处理Unicode文本的方式,并且所有版本的Delphi都不会丢失数据(从Delphi 6到XE),而
WideString
不是一个选择,因为它很慢,你也有同样的隐含转换问题。
但为了达到最佳效果速度,我们编写一些优化的函数来处理我们的自定义字符串类型:
{{RawUTF8是一个存储在AnsiString
- 使用此类型而不是System.UTF8String,Delphi 2009编译器和以前版本之间的行为更改了
:我们的实现
与所有版本的Delphi编译器一致并兼容
- mim ic Delphi 2009 UTF8String,没有字符集转换开销
- 所有转换到/从AnsiString或RawUnicode必须是显式的}
{$ ifdef UNICODE} RawUTF8 =类型AnsiString(CP_UTF8); // UTF8string的代码页
{$ else} RawUTF8 = type AnsiString; {$ endif}
///我们的Fast RawFF8版本的Trim(),仅用于Unicode编译器
// - 这个Trim()很少被使用,但是这个RawUTF8特定的版本是需要的
//由Delphi 2009/2010 / XE,以避免两次不必要的转换到UnicodeString
函数Trim(const S:RawUTF8):RawUTF8;
///我们的快速RawUTF8版本的Pos(),仅用于Unicode编译器
// - 这个Pos()很少使用,但是这个RawUTF8的特定版本需要
//由Delphi 2009/2010 / XE,以避免两次不必要的转换成UnicodeString
函数Pos(const substr,str:RawUTF8):Integer;超载;一致;
我们保留了 RawByteString
类型进行处理BLOB数据:
{$ ifndef UNICODE}
/// define RawByteString,因为它存在于Delphi 2009 / 2010 / XE
// - 用于字节存储到AnsiString
// - 如果您不希望Delphi编译器不要执行任何
//代码页,请使用此类型将类型的AnsiString分配给RawByteString,
//即RawUTF8或WinAnsiString
RawByteString = AnsiString;
///指向RawByteString
的指针PRawByteString = ^ RawByteString;
{$ endif}
///从字符串内容创建一个文件
// - 使用RawByteString进行字节存储,无论代码页是
函数FileFromString( const内容:RawByteString; const FileName:TFileName;
FlushOnDisk:boolean = false):boolean;
源代码可用在我们的存储库。在这个单元中,UTF-8的相关功能得到了深入的优化,两种版本都以pascal和asm的速度提升。我们有时会重载默认函数(如 Pos
)以避免转换,或者有关我们如何处理框架中的文本的更多信息是这里可用。
最后一个字: / p>
如果您确定,您的应用程序中只会有7位内容(不加重字符),则可以使用默认的 AnsiString
键入您的程序。但是在这种情况下,您应该更好地在中使用
子句添加 AnsiStrings
单元,以使重载的字符串函数将避免最不需要的转换。
I had a similar question to this here: Delphi XE - should I use String or AnsiString? . After deciding that it is right to use ANSI strings in a (large) library of mine, I have realized that I can actually use RawByteString instead of ANSI. Because I mix UNICODE strings with ANSI strings, my code now has quite few places where it does conversions between them. However, it looks like if I use RawByteString I get rid of those conversions.
Please let me know your opinion about it.
Thanks.
Update:
This seems to be disappointing. It looks like the compiler still makes a conversion from RawByteString to string.
procedure TForm1.FormCreate(Sender: TObject);
var x1, x2: RawByteString;
s: string;
begin
x1:= 'a';
x2:= 'b';
x1:= x1+ x2;
s:= x1; { <------- Implicit string cast from 'RawByteString' to 'string' }
end;
I think it does some internal workings (such as copying data) and my code will not be much faster and I will still have to add lots of typecasts in my code in order to silence the compiler.
RawByteString
is an AnsiString
with no code page set by default.
When you assign another string
to this RawByteString
variable, you'll copy the code page of the source string
. And this will include a conversion. Sorry.
But there is one another use of RawByteString
, which is to store plain byte content (e.g. a database BLOB field content, just like an array of byte
)
To summarize:
RawByteString
should be used as a "code page agnostic" parameter to a method or function;RawByteString
can be used as a variable type to store some BLOB data.
If you want to reduce conversion, and would rather use 8 bit char string
in your application, you should better:
- Do not use the generic
AnsiString
type, which will depend on the current system code page, and by which you'll loose data; - Rely on UTF-8 encoding, i.e. some 8 bit code page / charset which won't loose any data when converted from or to an
UnicodeString
; - Don't let the compiler show warnings about implicit conversions: all conversion should be made explicit;
- Use your own dedicated set of functions to handle your UTF-8 content.
That exactly what we made for our framework. We wanted to use UTF-8 in its kernel because:
- We rely on UTF-8 encoded JSON for data transmission;
- Memory consumption will be smaller;
- The used SQLite3 engine will store text as UTF-8 in its database file;
- We wanted a way of handling Unicode text with no loose of data with all versions of Delphi (from Delphi 6 up to XE), and
WideString
was not an option because it's dead slow and you've got the same problem of implicit conversions.
But, in order to achieve best speed, we write some optimized functions to handle our custom string type:
{{ RawUTF8 is an UTF-8 String stored in an AnsiString
- use this type instead of System.UTF8String, which behavior changed
between Delphi 2009 compiler and previous versions: our implementation
is consistent and compatible with all versions of Delphi compiler
- mimic Delphi 2009 UTF8String, without the charset conversion overhead
- all conversion to/from AnsiString or RawUnicode must be explicit }
{$ifdef UNICODE} RawUTF8 = type AnsiString(CP_UTF8); // Codepage for an UTF8string
{$else} RawUTF8 = type AnsiString; {$endif}
/// our fast RawUTF8 version of Trim(), for Unicode only compiler
// - this Trim() is seldom used, but this RawUTF8 specific version is needed
// by Delphi 2009/2010/XE, to avoid two unnecessary conversions into UnicodeString
function Trim(const S: RawUTF8): RawUTF8;
/// our fast RawUTF8 version of Pos(), for Unicode only compiler
// - this Pos() is seldom used, but this RawUTF8 specific version is needed
// by Delphi 2009/2010/XE, to avoid two unnecessary conversions into UnicodeString
function Pos(const substr, str: RawUTF8): Integer; overload; inline;
And we reserved the RawByteString
type for handling BLOB data:
{$ifndef UNICODE}
/// define RawByteString, as it does exist in Delphi 2009/2010/XE
// - to be used for byte storage into an AnsiString
// - use this type if you don't want the Delphi compiler not to do any
// code page conversions when you assign a typed AnsiString to a RawByteString,
// i.e. a RawUTF8 or a WinAnsiString
RawByteString = AnsiString;
/// pointer to a RawByteString
PRawByteString = ^RawByteString;
{$endif}
/// create a File from a string content
// - uses RawByteString for byte storage, thatever the codepage is
function FileFromString(const Content: RawByteString; const FileName: TFileName;
FlushOnDisk: boolean=false): boolean;
Source code is available in our repository. In this unit, UTF-8 related functions were deeply optimized, with both version in pascal and asm for better speed. We sometimes overloaded default functions (like Pos
) to avoid conversion, or More information about how we handled text in the framework is available here.
Last word:
If you are sure that you will only have 7 bit content in your application (no accentuated characters), you may use the default AnsiString
type in your program. But in this case, you should better add the AnsiStrings
unit in your uses
clause to have overloaded string functions which will avoid most unwanted conversion.
这篇关于Delphi XE - RawByteString与AnsiString的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!