输入编码:接受UTF-8 [英] Input encoding : accepting UTF-8

查看:62
本文介绍了输入编码:接受UTF-8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在PowerShell下获取本机应用程序的输出。问题是,输出使用UTF-8编码(没有BOM),PowerShell无法识别,只是将那些时髦的UTF字符直接转换为Unicode。

I need to get output of native application under PowerShell. The problem is, output is encoded with UTF-8 (no BOM), which PowerShell does not recognize and just converts those funky UTF chars directly into Unicode.

我已经发现PowerShell具有 $ OutputEncoding 变量,但它似乎不会影响输入数据。

I've found PowerShell has $OutputEncoding variable, but it does not seem to affect input data.

好的ol'iconv是也没有帮助,因为这种不必要的UTF8-as-if-ASCII => Unicode转换发生在下一个管道成员获取数据之前。

Good ol' iconv is of no help either, since this unnecessary UTF8-as-if-ASCII => Unicode conversion takes place before the next pipeline member acquires data.

推荐答案

我现在看到以下程序(stdout.cpp-cl stdout.cpp)的问题:

I see the issue now with the program below (stdout.cpp - cl stdout.cpp):

#include <stdio.h>

void main()
{
    char bytes[] = { 0x41, 0x53, 0x43, 0x49, 
                     0x49, 0x20, 0x6F, 0x75, 
                     0x74, 0x70, 0x75, 0x74,
                     0xE1, 0xBE, 0xB9};

    for (int i = 0; i < 15; i++)
    {
        printf("%c", bytes[i]);
    }                
}

并通过 |运行它外文件-enc UTF8 foo.txt 给出了胡言乱语:

And running that through | Out-File -enc UTF8 foo.txt gives the gibberish:

PS> fhex foo.txt

Address:  0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F ASCII
-------- ----------------------------------------------- ----------------
00000000 EF BB BF 41 53 43 49 49 20 6F 75 74 70 75 74 0D ...ASCII output.
00000010 9F E2 95 9B E2 95 A3 0D 0A                      .........

请注意,fhex是 PSCX 实用程序。

Note that fhex is a PSCX utility.

更新:弄清楚如何使其工作:

UPDATE: Figured out how to get this to work:

$enc = [Console]::OutputEncoding
[Console]::OutputEncoding = [text.encoding]::utf8
.\stdout.exe | out-file fubar3.txt -enc utf8
fhex .\fubar3.txt

Address:  0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F ASCII
-------- ----------------------------------------------- ----------------
00000000 EF BB BF 41 53 43 49 49 20 6F 75 74 70 75 74 E1 ...ASCII output.
00000010 BE B9 0D 0A                                     ....

[Console]::OutputEncoding = $enc

这篇关于输入编码:接受UTF-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆