writeln的明显副作用(“:width”说明符会导致输出中的问号) [英] apparent side effects of writeln (“:width” specifier causes question marks in output)
问题描述
我有以下代码(RAD Studio XE2,Windows 7 x64):
I have the following code (RAD Studio XE2, Windows 7 x64):
program letters;
{$APPTYPE CONSOLE}
{$DEFINE BOO}
const
ENGLISH_ALPHABET = 'abcdefghijklmnopqrstuvwxyz';
begin
{$IFDEF BOO}
writeln;
{$ENDIF}
write(ENGLISH_ALPHABET[1]:3);
readln;
end.
当 {$ DEFINE BOO}
指令是关闭,我有以下(预期)输出(空格被替换为可读性):
..a
当指令转到 ,我有以下(意外)输出:
When the directive is turned on, I have the following (unexpected) output:
// empty line here
?..a
而不是预期的
// empty line here
..a
当我更改 const <_ code> to
const ENGLISH_ALPHABET:AnsiString
,预期输出无问题的字符打印。当:3
格式化被删除或更改为:1
时,没有问号。当输出重定向到文件(通过 AssignFile(Output,'boo.log')
或从命令行)时,再次没有问号。
When I change const ENGLISH_ALPHABET
to const ENGLISH_ALPHABET: AnsiString
, the expected output is printed without question character. When :3
formatting is removed or changed to :1
, there is no question mark. When the output is redirected to file (either by AssignFile(Output, 'boo.log')
or from command line), there is no question mark again.
这个行为的正确解释是什么?
What is the correct explanation for this behavior?
推荐答案
这是一个很奇怪RTL中的错误。对写入
的调用解决了对 _WriteWChar
的调用。这个功能是这样实现的:
This is a rather odd bug in the RTL. The call to write
resolves to a call to _WriteWChar
. This function is implemented like this:
function _WriteWChar(var t: TTextRec; c: WideChar; width: Integer): Pointer;
begin
if width <= 1 then
result := _Write0WChar(t, c)
else
begin
if t.UTF16Buffer[0] <> #0 then
begin
_Write0WChar(t, '?');
t.UTF16Buffer[0] := #0;
end;
_WriteSpaces(t, width - 1);
Result := _Write0WChar(t, c);
end;
end;
您看到的?
上面的代码。
所以,为什么会发生这种情况。我可以构建的最简单的SSCCE是这样的:
So, why does this happen. The simplest SSCCE that I can construct is this:
{$APPTYPE CONSOLE}
const
ENGLISH_ALPHABET = 'abcdefghijklmnopqrstuvwxyz';
begin
writeln;
write(ENGLISH_ALPHABET[1]:3);
end.
所以,你第一次调用 writeln
解决这个问题:
So, your first call writeln
and that resolves to this:
function _WriteLn(var t: TTextRec): Pointer;
begin
if (t.Flags and tfCRLF) <> 0 then
_Write0Char(t, _AnsiChr(cCR));
Result := _Write0Char(t, _AnsiChr(cLF));
_Flush(t);
end;
在此你推单个角色, cLF
,ASCII字符10,换行,输出文本记录。这导致 t.MBCSBuffer
正在馈送 cLF
字符。这个字符留在缓冲区中,这很好,因为 System._Write0Char.WriteUnicodeFromMBCSBuffer
这样做:
Here you push a single character, cLF
, ASCII character 10, linefeed, onto the output text record. This results in t.MBCSBuffer
being fed the cLF
character. That character is left in the buffer which is fine because System._Write0Char.WriteUnicodeFromMBCSBuffer
does this:
t.MBCSLength := 0;
t.MBCSBufPos := 0;
但是当 _WriteWChar
执行时,它会不加区别地看起来在 t.UTF16Buffer
中。这是在 TTextRec
中声明的,如下所示:
But when _WriteWChar
executes, it indiscriminately looks in t.UTF16Buffer
. Which is declared in TTextRec
like this:
type
TTextRec = packed record
....
MBCSLength: ShortInt;
MBCSBufPos: Byte;
case Integer of
0: (MBCSBuffer: array[0..5] of _AnsiChr);
1: (UTF16Buffer: array[0..2] of WideChar);
end;
所以, MBCSBuffer
和 UTF16Buffer
共享相同的存储空间。
So, MBCSBuffer
and UTF16Buffer
share the same storage.
错误是 _WriteWChar
不应该不要先检查缓冲区的长度,看看 t.UTF16Buffer
的内容。某些事情并不明显如何实现,因为 TTextRec
没有 UTF16Length
。相反,如果 t.UTF16Buffer
包含有意义的内容,约定是它的长度由 -t.MBCSLength
给出!
The bug is that _WriteWChar
should not look at the content of t.UTF16Buffer
without first checking the length of the buffer. Something that is not immediately obvious how to achieve because TTextRec
has not UTF16Length
. Instead, if t.UTF16Buffer
contains meaningful content, the convention is that its length is given by -t.MBCSLength
!
所以 _WriteWChar
应该是:
function _WriteWChar(var t: TTextRec; c: WideChar; width: Integer): Pointer;
begin
if width <= 1 then
result := _Write0WChar(t, c)
else
begin
if (t.MBCSLength < 0) and (t.UTF16Buffer[0] <> #0) then
begin
_Write0WChar(t, '?');
t.UTF16Buffer[0] := #0;
end;
_WriteSpaces(t, width - 1);
Result := _Write0WChar(t, c);
end;
end;
这是一个相当恶劣的黑客修复 _WriteWChar
。请注意,我无法获取 System._WriteSpaces
的地址,以便能够调用它。如果你绝望地解决这个问题,那可以做到这一点。
Here is a rather vile hack that fixes _WriteWChar
. Note that I have not been able to get the address of System._WriteSpaces
to be able to call it. That's something that could be done if you were desperate to fix this.
{$APPTYPE CONSOLE}
uses
Windows;
procedure PatchCode(Address: Pointer; const NewCode; Size: Integer);
var
OldProtect: DWORD;
begin
if VirtualProtect(Address, Size, PAGE_EXECUTE_READWRITE, OldProtect) then
begin
Move(NewCode, Address^, Size);
FlushInstructionCache(GetCurrentProcess, Address, Size);
VirtualProtect(Address, Size, OldProtect, @OldProtect);
end;
end;
type
PInstruction = ^TInstruction;
TInstruction = packed record
Opcode: Byte;
Offset: Integer;
end;
procedure RedirectProcedure(OldAddress, NewAddress: Pointer);
var
NewCode: TInstruction;
begin
NewCode.Opcode := $E9;//jump relative
NewCode.Offset := NativeInt(NewAddress)-NativeInt(OldAddress)-SizeOf(NewCode);
PatchCode(OldAddress, NewCode, SizeOf(NewCode));
end;
var
_Write0WChar: function(var t: TTextRec; c: WideChar): Pointer;
function _Write0WCharAddress: Pointer;
asm
MOV EAX, offset System.@Write0WChar
end;
function _WriteWCharAddress: Pointer;
asm
MOV EAX, offset System.@WriteWChar
end;
function _WriteWChar(var t: TTextRec; c: WideChar; width: Integer): Pointer;
var
i: Integer;
begin
if width <= 1 then
result := _Write0WChar(t, c)
else
begin
if (t.MBCSLength < 0) and (t.UTF16Buffer[0] <> #0) then
begin
_Write0WChar(t, '?');
t.UTF16Buffer[0] := #0;
end;
for i := 1 to width - 1 do
_Write0WChar(t, ' ');
Result := _Write0WChar(t, c);
end;
end;
const
ENGLISH_ALPHABET = 'abcdefghijklmnopqrstuvwxyz';
begin
@_Write0WChar := _Write0WCharAddress;
RedirectProcedure(_WriteWCharAddress, @_WriteWChar);
writeln;
write(ENGLISH_ALPHABET[1]:3);
end.
我提交了 QC#123157 。
这篇关于writeln的明显副作用(“:width”说明符会导致输出中的问号)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!