检测'文本'文件类型(ANSI vs UTF-8) [英] Detecting 'text' file type (ANSI vs UTF-8)
问题描述
读取这些结果文件的程序将不得不读取Delphi创建的文件和通过Internet创建的文件。
虽然我可以将UTF-8文本转换为ANSI(使用狡猾命名的函数UTF8ToANSI),但我可以提前告诉我有哪些文件? p>
看到我自己的文件格式,我想最简单的方法来处理这个将是一个标记在文件中的一个已知的位置,这将告诉我程序的源码(Delphi / Internet),但这似乎是欺骗。
提前感谢
如果UTF文件以UTF-8字节顺序标记(BOM)开头,这很容易:
函数UTF8FileBOM(const FileName:string):boolean;
var
txt:file;
bytes:数组[0..2]字节;
amt:integer;
begin
FileMode:= fmOpenRead;
AssignFile(txt,FileName);
重置(txt,1);
try
BlockRead(txt,bytes,3,amt);
result:=(amt = 3)和(bytes [0] = $ EF)和(bytes [1] = $ BB)和(bytes [2] = $ BF);
finally
CloseFile(txt);
结束
end;
否则,要困难得多。
I wrote an application (a psychological testing exam) in Delphi (7) which creates a standard text file - ie the file is of type ANSI.
Someone has ported the program to run on the Internet, probably using Java, and the resulting text file is of type UTF-8.
The program which reads these results files will have to read both the files created by Delphi and the files created via the Internet.
Whilst I can convert the UTF-8 text to ANSI (using the cunningly named function UTF8ToANSI), how can I tell in advance which kind of file I have?
Seeing as I 'own' the file format, I suppose the easiest way to deal with this would be to place a marker within the file at a known position which will tell me the source of the program (Delphi/Internet), but this seems to be cheating.
Thanks in advance.
If the UTF file begins with the UTF-8 Byte-Order Mark (BOM), this is easy:
function UTF8FileBOM(const FileName: string): boolean;
var
txt: file;
bytes: array[0..2] of byte;
amt: integer;
begin
FileMode := fmOpenRead;
AssignFile(txt, FileName);
Reset(txt, 1);
try
BlockRead(txt, bytes, 3, amt);
result := (amt=3) and (bytes[0] = $EF) and (bytes[1] = $BB) and (bytes[2] = $BF);
finally
CloseFile(txt);
end;
end;
Otherwise, it is much more difficult.
这篇关于检测'文本'文件类型(ANSI vs UTF-8)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!