从Windows-1252将一个字符串的字符编码​​成UTF-8 [英] Convert a string's character encoding from windows-1252 to utf-8

查看:221
本文介绍了从Windows-1252将一个字符串的字符编码​​成UTF-8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经转换为Word文档(DOCX)为HTML,转换的HTML有窗口1252作为其字符编码。 .NET中的编码所有的特殊字符这个1252字符被显示为。这个HTML被显示在拉德编辑器,正确显示如果HTML是UTF-8格式。

I had converted a Word Document(docx) to html, the converted html has windows-1252 as its character encoding. In .Net for this 1252 character encoding all the special characters are being displayed as '�'. This html is being displayed in a Rad Editor which displays correctly if the html is in Utf-8 format.

我试过以下code,但没有静脉

I had tried the following code but no vein

Encoding wind1252 = Encoding.GetEncoding(1252);  
Encoding utf8 = Encoding.UTF8;  
byte[] wind1252Bytes = wind1252.GetBytes(strHtml);  
byte[] utf8Bytes = Encoding.Convert(wind1252, utf8, wind1252Bytes);  
char[] utf8Chars = new char[utf8.GetCharCount(utf8Bytes, 0, utf8Bytes.Length)];   
utf8.GetChars(utf8Bytes, 0, utf8Bytes.Length, utf8Chars, 0);  
string utf8String = new string(utf8Chars);

这是如何将HTML转换成UTF-8有什么建议?

Any suggestions on how to convert the html into UTF-8?

推荐答案

其实问题就出在这里。

byte[] wind1252Bytes = wind1252.GetBytes(strHtml); 

我们不应该从HTML字符串的字节数。我想下面的code和它的工作。

We should not get the bytes from the html String. I tried the below code and it worked.

Encoding wind1252 = Encoding.GetEncoding(1252);
Encoding utf8 = Encoding.UTF8;
byte[] wind1252Bytes = ReadFile(Server.MapPath(HtmlFile));
byte[] utf8Bytes = Encoding.Convert(wind1252, utf8, wind1252Bytes);
string utf8String = Encoding.UTF8.GetString(utf8Bytes);


public static byte[] ReadFile(string filePath)      
    {      
        byte[] buffer;   
        FileStream fileStream = new FileStream(filePath, FileMode.Open, FileAccess.Read);  
        try
        {
            int length = (int)fileStream.Length;  // get file length    
            buffer = new byte[length];            // create buffer     
            int count;                            // actual number of bytes read     
            int sum = 0;                          // total number of bytes read    

            // read until Read method returns 0 (end of the stream has been reached)    
            while ((count = fileStream.Read(buffer, sum, length - sum)) > 0)
                sum += count;  // sum is a buffer offset for next reading
        }
        finally
        {
            fileStream.Close();
        }
        return buffer;
    }

这篇关于从Windows-1252将一个字符串的字符编码​​成UTF-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆