writeln的明显副作用(“:width”说明符会导致输出中的问号) [英] apparent side effects of writeln (“:width” specifier causes question marks in output)

查看:234
本文介绍了writeln的明显副作用(“:width”说明符会导致输出中的问号)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下代码(RAD Studio XE2,Windows 7 x64):

I have the following code (RAD Studio XE2, Windows 7 x64):

program letters;

{$APPTYPE CONSOLE}

{$DEFINE BOO}

const
  ENGLISH_ALPHABET = 'abcdefghijklmnopqrstuvwxyz';

begin
{$IFDEF BOO}
  writeln;
{$ENDIF}
  write(ENGLISH_ALPHABET[1]:3);

  readln;
end.

{$ DEFINE BOO} 指令是关闭,我有以下(预期)输出(空格被替换为可读性)

..a

当指令转到 ,我有以下(意外)输出:

When the directive is turned on, I have the following (unexpected) output:

// empty line here
?..a

而不是预期的

// empty line here
..a

当我更改 const <_ code> to const ENGLISH_ALPHABET:AnsiString ,预期输出无问题的字符打印。当:3 格式化被删除或更改为:1 时,没有问号。当输出重定向到文件(通过 AssignFile(Output,'boo.log')或从命令行)时,再次没有问号。

When I change const ENGLISH_ALPHABET to const ENGLISH_ALPHABET: AnsiString, the expected output is printed without question character. When :3 formatting is removed or changed to :1, there is no question mark. When the output is redirected to file (either by AssignFile(Output, 'boo.log') or from command line), there is no question mark again.

这个行为的正确解释是什么?

What is the correct explanation for this behavior?

推荐答案

这是一个很奇怪RTL中的错误。对写入的调用解决了对 _WriteWChar 的调用。这个功能是这样实现的:

This is a rather odd bug in the RTL. The call to write resolves to a call to _WriteWChar. This function is implemented like this:

function _WriteWChar(var t: TTextRec; c: WideChar; width: Integer): Pointer;
begin
  if width <= 1 then
    result := _Write0WChar(t, c)
  else
  begin
    if t.UTF16Buffer[0] <> #0 then
    begin
      _Write0WChar(t, '?');
      t.UTF16Buffer[0] := #0;
    end;

    _WriteSpaces(t, width - 1);
    Result := _Write0WChar(t, c);
  end;
end;

您看到的上面的代码。

所以,为什么会发生这种情况。我可以构建的最简单的SSCCE是这样的:

So, why does this happen. The simplest SSCCE that I can construct is this:

{$APPTYPE CONSOLE}
const
  ENGLISH_ALPHABET = 'abcdefghijklmnopqrstuvwxyz';
begin
  writeln;
  write(ENGLISH_ALPHABET[1]:3);
end.

所以,你第一次调用 writeln 解决这个问题:

So, your first call writeln and that resolves to this:

function _WriteLn(var t: TTextRec): Pointer;
begin
  if (t.Flags and tfCRLF) <> 0 then
    _Write0Char(t, _AnsiChr(cCR));
  Result := _Write0Char(t, _AnsiChr(cLF));
  _Flush(t);
end;

在此你推单个角色, cLF ,ASCII字符10,换行,输出文本记录。这导致 t.MBCSBuffer 正在馈送 cLF 字符。这个字符留在缓冲区中,这很好,因为 System._Write0Char.WriteUnicodeFromMBCSBuffer 这样做:

Here you push a single character, cLF, ASCII character 10, linefeed, onto the output text record. This results in t.MBCSBuffer being fed the cLF character. That character is left in the buffer which is fine because System._Write0Char.WriteUnicodeFromMBCSBuffer does this:

t.MBCSLength := 0;
t.MBCSBufPos := 0;

但是当 _WriteWChar 执行时,它会不加区别地看起来在 t.UTF16Buffer 中。这是在 TTextRec 中声明的,如下所示:

But when _WriteWChar executes, it indiscriminately looks in t.UTF16Buffer. Which is declared in TTextRec like this:

type
  TTextRec = packed record 
    ....
    MBCSLength: ShortInt;
    MBCSBufPos: Byte;
    case Integer of
      0: (MBCSBuffer: array[0..5] of _AnsiChr);
      1: (UTF16Buffer: array[0..2] of WideChar);
  end;

所以, MBCSBuffer UTF16Buffer 共享相同的存储空间。

So, MBCSBuffer and UTF16Buffer share the same storage.

错误是 _WriteWChar 不应该不要先检查缓冲区的长度,看看 t.UTF16Buffer 的内容。某些事情并不明显如何实现,因为 TTextRec 没有 UTF16Length 。相反,如果 t.UTF16Buffer 包含有意义的内容,约定是它的长度由 -t.MBCSLength 给出!

The bug is that _WriteWChar should not look at the content of t.UTF16Buffer without first checking the length of the buffer. Something that is not immediately obvious how to achieve because TTextRec has not UTF16Length. Instead, if t.UTF16Buffer contains meaningful content, the convention is that its length is given by -t.MBCSLength!

所以 _WriteWChar 应该是:

function _WriteWChar(var t: TTextRec; c: WideChar; width: Integer): Pointer;
begin
  if width <= 1 then
    result := _Write0WChar(t, c)
  else
  begin
    if (t.MBCSLength < 0) and (t.UTF16Buffer[0] <> #0) then
    begin
      _Write0WChar(t, '?');
      t.UTF16Buffer[0] := #0;
    end;

    _WriteSpaces(t, width - 1);
    Result := _Write0WChar(t, c);
  end;
end;

这是一个相当恶劣的黑客修复 _WriteWChar 。请注意,我无法获取 System._WriteSpaces 的地址,以便能够调用它。如果你绝望地解决这个问题,那可以做到这一点。

Here is a rather vile hack that fixes _WriteWChar. Note that I have not been able to get the address of System._WriteSpaces to be able to call it. That's something that could be done if you were desperate to fix this.

{$APPTYPE CONSOLE}

uses
  Windows;

procedure PatchCode(Address: Pointer; const NewCode; Size: Integer);
var
  OldProtect: DWORD;
begin
  if VirtualProtect(Address, Size, PAGE_EXECUTE_READWRITE, OldProtect) then
  begin
    Move(NewCode, Address^, Size);
    FlushInstructionCache(GetCurrentProcess, Address, Size);
    VirtualProtect(Address, Size, OldProtect, @OldProtect);
  end;
end;

type
  PInstruction = ^TInstruction;
  TInstruction = packed record
    Opcode: Byte;
    Offset: Integer;
  end;

procedure RedirectProcedure(OldAddress, NewAddress: Pointer);
var
  NewCode: TInstruction;
begin
  NewCode.Opcode := $E9;//jump relative
  NewCode.Offset := NativeInt(NewAddress)-NativeInt(OldAddress)-SizeOf(NewCode);
  PatchCode(OldAddress, NewCode, SizeOf(NewCode));
end;

var
  _Write0WChar: function(var t: TTextRec; c: WideChar): Pointer;

function _Write0WCharAddress: Pointer;
asm
  MOV     EAX, offset System.@Write0WChar
end;

function _WriteWCharAddress: Pointer;
asm
  MOV     EAX, offset System.@WriteWChar
end;

function _WriteWChar(var t: TTextRec; c: WideChar; width: Integer): Pointer;
var
  i: Integer;
begin
  if width <= 1 then
    result := _Write0WChar(t, c)
  else
  begin
    if (t.MBCSLength < 0) and (t.UTF16Buffer[0] <> #0) then
    begin
      _Write0WChar(t, '?');
      t.UTF16Buffer[0] := #0;
    end;

    for i := 1 to width - 1 do
      _Write0WChar(t, ' ');
    Result := _Write0WChar(t, c);
  end;
end;

const
  ENGLISH_ALPHABET = 'abcdefghijklmnopqrstuvwxyz';

begin
  @_Write0WChar := _Write0WCharAddress;
  RedirectProcedure(_WriteWCharAddress, @_WriteWChar);

  writeln;
  write(ENGLISH_ALPHABET[1]:3);
end.

我提交了 QC#123157

这篇关于writeln的明显副作用(“:width”说明符会导致输出中的问号)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆