unicode文本文件输出在XE2和Delphi 2009之间有所不同吗? [英] unicode text file output differs between XE2 and Delphi 2009?
问题描述
当我尝试下面的代码时,与D2009相比,XE2中的代码似乎有所不同。
procedure TForm1.Button1Click(发件人:TObject);
var Outfile:textfile;
myByte:Byte;
begin
assignfile(Outfile,'test_chinese.txt');
重写(Outfile);
为myByte在TEncoding.UTF8.GetPreamble做写(Outfile,AnsiChar(myByte));
//这是UTF-8 BOM
Writeln(Outfile,utf8string('总结'));
Writeln(Outfile,'°C');
Closefile(Outfile);
结束
在Windows 8 PC上编译XE2在写字板中给出
??
C
txt十六进制代码:EF BB BF 3F 3F 0D 0A B0 43 0D 0A
在Windows XP PC上使用D2009编写,在Wordpad中提供
总结
°C
txt hex代码:EF BB BF E6 80 BB E7 BB 93 0D 0A B0 43 0D 0A
我的问题是为什么它不同,如何将汉字保存到文本文件中旧的文本文件I / O?
谢谢!
在XE2之前, AssignFile()
有一个可选的 CodePage
参数,设置输出文件的代码页:
函数AssignFile(var F:File; FileName:String; [CodePage:Word]):Integer;超载;
Write()
和 Writeln()
都有重载,支持 UnicodeString
和 WideChar
输入。 p>
因此,您可以创建一个文件,其代码页设置为 CP_UTF8
,然后写入/ ln()
会在将Unicode字符串写入文件时自动将Unicode字符串转换为UTF-8。
缺点是您不会能够使用 AnsiChar
值写入UTF-8 BOM,因为单个字节将被转换为UTF-8,因此不能正确写入。您可以通过将BOM编写为单个Unicode字符(它实际上是 - U + FEFF
)而不是单个字节来解决。
这在XE2中有效:
procedure TForm1.Button1Click(Sender:TObject);
var
Outfile:TextFile;
begin
AssignFile(Outfile,'test_chinese.txt',CP_UTF8);
重写(Outfile);
//这是UTF-8 BOM
写(Outfile,#$ FEFF);
Writeln(Outfile,'总结');
Writeln(Outfile,'°C');
CloseFile(Outfile);
结束如果您希望在D2009和XE2之间更兼容和更可靠的东西,请使用
TStreamWriter
代替: procedure TForm1.Button1Click(Sender:TObject);
var
Outfile:TStreamWriter;
begin
Outfile:= TStreamWriter.Create('test_chinese.txt',False,TEncoding.UTF8);
try
Outfile.WriteLine('总结');
Outfile.WriteLine('°C');
finally
Outfile.Free;
结束
结束
或手动执行文件I / O:
procedure TForm1.Button1Click(Sender:TObject);
var
Outfile:TFileStream;
BOM:TBytes;
程序WriteBytes(const B:TBytes);
begin
如果B& ''然后Outfile.WriteBuffer(B [0],Length(B));
结束
程序WriteStr(const S:UTF8String);
begin
如果S - ''然后Outfile.WriteBuffer(S [1],Length(S));
结束
程序WriteLine(const S:UTF8String);
begin
WriteStr(S);
WriteStr(sLineBreak);
结束
begin
Outfile:= TFileStream.Create('test_chinese.txt',fmCreate);
try
WriteBytes(TEncoding.UTF8.GetPreamble);
WriteLine('总结');
WriteLine('°C');
finally
Outfile.Free;
结束
结束
When I try the code below there seem to be different output in XE2 compared to D2009.
procedure TForm1.Button1Click(Sender: TObject);
var Outfile:textfile;
myByte: Byte;
begin
assignfile(Outfile,'test_chinese.txt');
Rewrite(Outfile);
for myByte in TEncoding.UTF8.GetPreamble do write(Outfile, AnsiChar(myByte));
//This is the UTF-8 BOM
Writeln(Outfile,utf8string('总结'));
Writeln(Outfile,'°C');
Closefile(Outfile);
end;
Compiling with XE2 on a Windows 8 PC gives in WordPad
??
C
txt hex code: EF BB BF 3F 3F 0D 0A B0 43 0D 0A
Compiling with D2009 on a Windows XP PC gives in Wordpad
总结
°C
txt hex code: EF BB BF E6 80 BB E7 BB 93 0D 0A B0 43 0D 0A
My questions is why it differs and how can I save Chinese characters to a text file using the old text file I/O?
Thanks!
解决方案 In XE2 onwards, AssignFile()
has an optional CodePage
parameter that sets the codepage of the output file:
function AssignFile(var F: File; FileName: String; [CodePage: Word]): Integer; overload;
Write()
and Writeln()
both have overloads that support UnicodeString
and WideChar
inputs.
So, you can create a file that has its codepage set to CP_UTF8
, and then Write/ln()
will automatically convert Unicode strings to UTF-8 when writing them to the file.
The downside is that you will not be able to write the UTF-8 BOM using AnsiChar
values anymore, because the individual bytes will get converted to UTF-8 and thus not be written correctly. You can get around that by writing the BOM as a single Unicode character (which it what it really is - U+FEFF
) instead of as individual bytes.
This works in XE2:
procedure TForm1.Button1Click(Sender: TObject);
var
Outfile: TextFile;
begin
AssignFile(Outfile, 'test_chinese.txt', CP_UTF8);
Rewrite(Outfile);
//This is the UTF-8 BOM
Write(Outfile, #$FEFF);
Writeln(Outfile, '总结');
Writeln(Outfile, '°C');
CloseFile(Outfile);
end;
With that said, if you want something that is more compatible and reliable between D2009 and XE2, use TStreamWriter
instead:
procedure TForm1.Button1Click(Sender: TObject);
var
Outfile: TStreamWriter;
begin
Outfile := TStreamWriter.Create('test_chinese.txt', False, TEncoding.UTF8);
try
Outfile.WriteLine('总结');
Outfile.WriteLine('°C');
finally
Outfile.Free;
end;
end;
Or do the file I/O manually:
procedure TForm1.Button1Click(Sender: TObject);
var
Outfile: TFileStream;
BOM: TBytes;
procedure WriteBytes(const B: TBytes);
begin
if B <> '' then Outfile.WriteBuffer(B[0], Length(B));
end;
procedure WriteStr(const S: UTF8String);
begin
if S <> '' then Outfile.WriteBuffer(S[1], Length(S));
end;
procedure WriteLine(const S: UTF8String);
begin
WriteStr(S);
WriteStr(sLineBreak);
end;
begin
Outfile := TFileStream.Create('test_chinese.txt', fmCreate);
try
WriteBytes(TEncoding.UTF8.GetPreamble);
WriteLine('总结');
WriteLine('°C');
finally
Outfile.Free;
end;
end;
这篇关于unicode文本文件输出在XE2和Delphi 2009之间有所不同吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!