从Windows-1252将一个字符串的字符编码成UTF-8 [英] Convert a string's character encoding from windows-1252 to utf-8
问题描述
我已经转换为Word文档(DOCX)为HTML,转换的HTML有窗口1252作为其字符编码。 .NET中的编码所有的特殊字符这个1252字符被显示为。这个HTML被显示在拉德编辑器,正确显示如果HTML是UTF-8格式。
I had converted a Word Document(docx) to html, the converted html has windows-1252 as its character encoding. In .Net for this 1252 character encoding all the special characters are being displayed as '�'. This html is being displayed in a Rad Editor which displays correctly if the html is in Utf-8 format.
我试过以下code,但没有静脉
I had tried the following code but no vein
Encoding wind1252 = Encoding.GetEncoding(1252);
Encoding utf8 = Encoding.UTF8;
byte[] wind1252Bytes = wind1252.GetBytes(strHtml);
byte[] utf8Bytes = Encoding.Convert(wind1252, utf8, wind1252Bytes);
char[] utf8Chars = new char[utf8.GetCharCount(utf8Bytes, 0, utf8Bytes.Length)];
utf8.GetChars(utf8Bytes, 0, utf8Bytes.Length, utf8Chars, 0);
string utf8String = new string(utf8Chars);
这是如何将HTML转换成UTF-8有什么建议?
Any suggestions on how to convert the html into UTF-8?
推荐答案
其实问题就出在这里。
byte[] wind1252Bytes = wind1252.GetBytes(strHtml);
我们不应该从HTML字符串的字节数。我想下面的code和它的工作。
We should not get the bytes from the html String. I tried the below code and it worked.
Encoding wind1252 = Encoding.GetEncoding(1252);
Encoding utf8 = Encoding.UTF8;
byte[] wind1252Bytes = ReadFile(Server.MapPath(HtmlFile));
byte[] utf8Bytes = Encoding.Convert(wind1252, utf8, wind1252Bytes);
string utf8String = Encoding.UTF8.GetString(utf8Bytes);
public static byte[] ReadFile(string filePath)
{
byte[] buffer;
FileStream fileStream = new FileStream(filePath, FileMode.Open, FileAccess.Read);
try
{
int length = (int)fileStream.Length; // get file length
buffer = new byte[length]; // create buffer
int count; // actual number of bytes read
int sum = 0; // total number of bytes read
// read until Read method returns 0 (end of the stream has been reached)
while ((count = fileStream.Read(buffer, sum, length - sum)) > 0)
sum += count; // sum is a buffer offset for next reading
}
finally
{
fileStream.Close();
}
return buffer;
}
这篇关于从Windows-1252将一个字符串的字符编码成UTF-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!