文件中的ReadText采用ANSII编码 [英] ReadText from file in ANSII encoding

查看:77
本文介绍了文件中的ReadText采用ANSII编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用Q42.Winrt库下载html文件以进行缓存. 但是当我使用ReadTextAsync时,我有例外:

I use Q42.Winrt library to download html file to cache. But when i use ReadTextAsync i have exception:

目标多字节代码页中不存在Unicode字符的映射. (来自HRESULT的异常:0x80070459)

No mapping for the Unicode character exists in the target multi-byte code page. (Exception from HRESULT: 0x80070459)

我的代码非常简单

var parsedPage = await WebDataCache.GetAsync(new Uri(String.Format("http://someUrl.here")));
var parsedStream = await FileIO.ReadTextAsync(parsedPage);

我打开下载的文件,并且有ANSII编码.我想我需要将其转换为UTF-8,但我不知道如何.

I open downloaded file and there is ANSII encoding. I think i need to convert it to UTF-8 but i don't know how.

推荐答案

问题是原始页面的编码不是Unicode,而是Windows-1251和 Encoding.GetEncoding 使用1251代码页解释字节并生成字符串(始终为Unicode).

The problem is that the encoding of the original page is not in Unicode, it's Windows-1251, and the ReadTextAsync function only handles Unicode or UTF8. The way around this is to read the file as binary and then use Encoding.GetEncoding to interpret the bytes with the 1251 code page and produce the string (which is always Unicode).

例如,

        String parsedStream;
        var parsedPage = await WebDataCache.GetAsync(new Uri(String.Format("http://bash.im")));

        var buffer = await FileIO.ReadBufferAsync(parsedPage);
        using (var dr = DataReader.FromBuffer(buffer))
        {
            var bytes1251 = new Byte[buffer.Length];
            dr.ReadBytes(bytes1251);

            parsedStream = Encoding.GetEncoding("Windows-1251").GetString(bytes1251, 0, bytes1251.Length);
        }

面临的挑战是,您无法从存储的字节中得知代码页是什么,因此它在这里可以工作,但可能不适用于其他站点.通常,您可以从网络上获得UTF-8,但并非总是如此.此页面的Content-Type响应标头显示代码页,但该信息未存储在文件中.

The challenge is you don't know from the stored bytes what the code page is, so it works here but may not work for other sites. Generally, UTF-8 is what you'll get from the web, but not always. The Content-Type response header of this page shows the code page, but that information isn't stored in the file.

这篇关于文件中的ReadText采用ANSII编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆