使用 chcp 65001 打印的 Windows UTF-8 - 字符神秘地重复 [英] Windows UTF-8 printed with chcp 65001 - characters are mysteriously duplicated

查看:22
本文介绍了使用 chcp 65001 打印的 Windows UTF-8 - 字符神秘地重复的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有一件事我无法理解:

我使用的是 Windows 7 和 Strawberry Perl 5.20,我想使用 chcp 65001 将 UTF-8 写入控制台 (cmd.exe).

I am using Windows 7 and Strawberry Perl 5.20, and I want to write UTF-8 to the console (cmd.exe) with chcp 65001.

UTF-8 字符本身很好,甚至 >255,但是有些字符存在神秘的重复(这只会在我不重定向到文件时发生)

The UTF-8 characters themselves are coming out fine, even >255, but there is a mysterious duplication of some caracters (this only happens if I don't redirect into a file)

我现在在 last-octet-repeated-when-my-perl-program-outputs-a-utf-8 -- 解决方案是将 binmode(STDOUT, 'unix:encoding(utf8):crlf') 注入 perl 程序——我已经测试过了,现在工作正常

I now have seen another post that had essentially the same problem at last-octet-repeated-when-my-perl-program-outputs-a-utf-8 -- the solution is to inject a binmode(STDOUT, 'unix:encoding(utf8):crlf') into the perl program -- I have tested and it works fine now

感谢所有研究这个奇怪问题的人.

Thanks to everybody who looked into this weird problem.

简而言之,当我重定向到平面文件时,我正在编写一个 UTF-8 字符串 (chr(300) x 3).chr(301)."UVW\x{0D}\x{0A}"然后打印平面文件,一切正常.

In a nutshell, I am writing a UTF-8 string (chr(300) x 3).chr(301)."UVW\x{0D}\x{0A}", when I redirect into a flat file and then print the flat file, everything is fine.

但是,当我直接打印到控制台时,有些字符神秘地重复了(我说的是单独一行中的字符VW"),不知道为什么

However, when I print directly to the console, some characters are mysteriously duplicated (I am talking about the characters "VW" in the seperate line), and I don't know why

这是我的测试输出

Page de codes active : 65001

Redirected into a file:
-----------------------
ĬĬĬĭUVW

Printed directly:
-----------------
ĬĬĬĭUVW
VW

IO-Layers = (unix crlf)

C4ACC4ACC4ACC4AD5556570D0A

这是我的测试程序:

@echo off
chcp 65001
echo.

set H1=BEGIN{binmode(*STDIN); undef $/;
set HEXDUMP="%H1% print uc(unpack('H*',<STDIN>)), qq{\n}}"

set L1=my @l = PerlIO::get_layers(*STDOUT, output, 1);
set LAYERS="%L1% print {*STDERR} qq{IO-Layers = (@l)\n};"

set PROG="print chr(300) x 3, chr(301), qq{UVW\n};";

set TFILE=%TEMP%\tfile.txt

echo Redirected into a file:
echo -----------------------
perl -C6 -e%PROG% >%TFILE% && type %TFILE%
echo.

echo Printed directly:
echo -----------------
perl -C6 -e%PROG%

echo.
perl -e%LAYERS%
echo.

perl -e%HEXDUMP% <%TFILE%

echo.
pause

正如我所说,字符本身打印正确,但为什么会出现这种神秘的重复?...以及为什么 * 只有 * 如果没有重定向到文件中?

As I said, the characters themselves are printed correctly, but why is there this mysterious duplication ? ...and why * only * if not redirected into a file ?

推荐答案

正如我所怀疑的,这已被报告为 Windows 软件中的故障:

As I suspected, this has been reported as a failure in Windows software:

这是由 Windows 中的错误引起的.当写入设置为代码页 65001 的控制台时,WriteFile() 返回写入的字符数而不是字节数.

This is caused by a bug in Windows. When writing to a console set to code page 65001, WriteFile() returns the number of characters written instead of the number of bytes.

我不知道有一种变通方法,但是如果 :unix:encoding(utf8):crlf PerlIO 堆栈适合您,那么您似乎已经找到了.

I wasn't aware of a work-around, but if the :unix:encoding(utf8):crlf PerlIO stack works for you then it seems you have found one.

这篇关于使用 chcp 65001 打印的 Windows UTF-8 - 字符神秘地重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆