在Windows上可以将wprintf输出正确重定向到UTF-16吗? [英] Can wprintf output be properly redirected to UTF-16 on Windows?

查看:118
本文介绍了在Windows上可以将wprintf输出正确重定向到UTF-16吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在C程序中,我正在使用wprintf在Windows控制台中打印Unicode(UTF-16)文本。这可以正常工作,但是当程序的输出重定向到日志文件时,日志文件的UTF-16编码已损坏。
在Windows命令提示符下完成重定向后,所有换行符都被编码为窄ASCII换行符(0d0a)。在PowerShell中完成重定向后,将插入空字符。

In a C program I'm using wprintf to print Unicode (UTF-16) text in a Windows console. This works fine, but when the output of the program is redirected to a log file, the log file has a corrupted UTF-16 encoding. When redirection is done in a Windows Command Prompt, all line breaks are encoded as a narrow ASCII line break (0d0a). When redirection is done in PowerShell, null characters are inserted.

是否可以将输出重定向到正确的UTF-16日志文件?

Is it possible to redirect the output to a proper UTF-16 log file?

示例程序:

#include <stdio.h>
#include <windows.h>
#include <fcntl.h>
#include <io.h>

int main () {

  int prevmode;

  prevmode = _setmode(_fileno(stdout), _O_U16TEXT);
  fwprintf(stdout,L"one\n");
  fwprintf(stdout,L"two\n");
  fwprintf(stdout,L"three\n");
  _setmode(_fileno(stdout), prevmode);


  return 0;
}

在命令提示符下重定向输出。请参阅0d0a,应为0d00 0a00:

Redirecting the output in Command Prompt. See the 0d0a which should be 0d00 0a00:

c:\test>.\testu16.exe > o.txt

c:\test>xxd o.txt
0000000: 6f00 6e00 6500 0d0a 0074 0077 006f 000d  o.n.e....t.w.o..
0000010: 0a00 7400 6800 7200 6500 6500 0d0a 00    ..t.h.r.e.e....

在PowerShell中重定向输出。查看所有插入的0000。

Redirecting the output in PowerShell. See all the 0000 inserted.

PS C:\test> .\testu16.exe > p.txt
PS C:\test> xxd p.txt
0000000: fffe 6f00 0000 6e00 0000 6500 0000 0d00  ..o...n...e.....
0000010: 0a00 0000 7400 0000 7700 0000 6f00 0000  ....t...w...o...
0000020: 0d00 0a00 0000 7400 0000 6800 0000 7200  ......t...h...r.
0000030: 0000 6500 0000 6500 0000 0d00 0a00 0000  ..e...e.........
0000040: 0d00 0a00                                ....


推荐答案

我从汉斯·帕桑特
谢谢汉斯。

I got this answer from Hans Passant. Thanks Hans.

错误的换行符是缓冲stdout的结果。在将模式设置回原始模式之前,需要刷新流。

The wrong line breaks are an effect of the buffering of stdout. We need to flush the stream before we set the mode back to the original mode.

prevmode = _setmode(_fileno(stdout), _O_U16TEXT);
fwprintf(stdout,L"one\n");
fwprintf(stdout,L"two\n");
fwprintf(stdout,L"three\n");
fflush(stdout);               /* flush stream */
_setmode(_fileno(stdout), prevmode);

在命令提示符(cmd.exe)中重定向输出可创建正确的UTF-16文件,而无需BOM

Redirecting the output in Command Prompt (cmd.exe) creates a correct UTF-16 file, without BOM.

c:\test>.\testu16 > o.txt

c:\test>xxd o.txt
0000000: 6f00 6e00 6500 0d00 0a00 7400 7700 6f00  o.n.e.....t.w.o.
0000010: 0d00 0a00 7400 6800 7200 6500 6500 0d00  ....t.h.r.e.e...
0000020: 0a00                                     ..

在Powershell中输出仍然错误。

In powershell the output is still wrong.

PS C:\test> .\testu16 > p.txt
PS C:\test> xxd p.txt
0000000: fffe 6f00 0000 6e00 0000 6500 0000 0d00  ..o...n...e.....
0000010: 0a00 0000 0d00 0a00 0000 7400 0000 7700  ..........t...w.
0000020: 0000 6f00 0000 0d00 0a00 0000 0d00 0a00  ..o.............
0000030: 0000 7400 0000 6800 0000 7200 0000 6500  ..t...h...r...e.
0000040: 0000 6500 0000 0d00 0a00 0000 0d00 0a00  ..e.............
0000050: 0000 0d00 0a00                           ......

这是因为PowerShell不会使流保持不变。它尝试解释它并将其转换为UTF-16。它猜测输入流编码为ANSI。 PowerShell添加了UTF-16 BOM,其余为双编码UTF-16。

This is because PowerShell doesn't keep the stream untouched. It tries to interpret it and convert it to UTF-16. It guessed that the input stream encoding was ANSI. PowerShell added an UTF-16 BOM and the rest is double encoded UTF-16. This explains the extra zeros.

即使使用输出文件并指定编码也无济于事。

Even using out-file and specifying the encoding doesn't help.

PS C:\test> .\testu16.exe | out-file p.txt -encoding unicode
PS C:\test> xxd p.txt
0000000: fffe 6f00 0000 6e00 0000 6500 0000 0d00  ..o...n...e.....
0000010: 0a00 0000 0d00 0a00 0000 7400 0000 7700  ..........t...w.
0000020: 0000 6f00 0000 0d00 0a00 0000 0d00 0a00  ..o.............
0000030: 0000 7400 0000 6800 0000 7200 0000 6500  ..t...h...r...e.
0000040: 0000 6500 0000 0d00 0a00 0000 0d00 0a00  ..e.............
0000050: 0000 0d00 0a00                           ......

需要告知PowerShell编码,方法是先打印UTF-16 BOM:

PowerShell needs to be informed about the encoding, which is done by first printing an UTF-16 BOM:

prevmode = _setmode(_fileno(stdout), _O_U16TEXT);
fwprintf(stdout, L"\xfeff");  /* UTF-16LE BOM */
fwprintf(stdout,L"one\n");
fwprintf(stdout,L"two\n");
fwprintf(stdout,L"three\n");
fflush(stdout);               /* flush stream */
_setmode(_fileno(stdout), prevmode);

现在我们得到了正确的UTF-16文件。

Now we get a correct UTF-16 file.

PS C:\test> .\testu16 > p.txt
PS C:\test> xxd p.txt
0000000: fffe 6f00 6e00 6500 0d00 0a00 7400 7700  ..o.n.e.....t.w.
0000010: 6f00 0d00 0a00 7400 6800 7200 6500 6500  o.....t.h.r.e.e.
0000020: 0d00 0a00

这篇关于在Windows上可以将wprintf输出正确重定向到UTF-16吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆