使用.NET如何ISO 8859-1转换连接包含Latin-1的重音字符为UTF-8 codeD文本文件 [英] Using .NET how to convert ISO 8859-1 encoded text files that contain Latin-1 accented characters to UTF-8

查看：133 发布时间：2016/8/28 13:26:20 c# utf-8 iso-8859-1 latin1

本文介绍了使用.NET如何ISO 8859-1转换连接包含Latin-1的重音字符为UTF-8 codeD文本文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在发送的文本文件包含来自重音字符Latin-1的范围内（以及普通的ASCII AZ等）。我如何把这些文件用C＃ UTF-8 使单字节的ISO重音字符8859- 1成为有效的UTF-8字符？

I am being sent text files saved in ISO 88591-1 format that contain accented characters from the Latin-1 range (as well as normal ASCII a-z, etc.). How do I convert these files to UTF-8 using C# so that the single-byte accented characters in ISO 8859-1 become valid UTF-8 characters?

我曾尝试使用一个StreamReader与ASCIIEncoding，然后通过实例化编码的ASCII字符串转换为UTF-8 ASCII 和编码 UTF8 ，然后用 Encoding.Convert（ASCII，UTF8，ascii.GetBytes（asciiString））＆MDASH;但重音符号被渲染为问号。

I have tried to use a StreamReader with ASCIIEncoding, and then converting the ASCII string to UTF-8 by instantiating encoding ascii and encoding utf8 and then using Encoding.Convert(ascii, utf8, ascii.GetBytes( asciiString) ) — but the accented characters are being rendered as question marks.

哪一步我缺少什么？

推荐答案

您需要得到适当的编码对象。 ASCII是一样它的命名：ASCII，这意味着它仅支持7位ASCII字符。如果你想要做的是转换的文件，那么这可能比直接处理的字节数组更加容易。

You need to get the proper Encoding object. ASCII is just as it's named: ASCII, meaning that it only supports 7-bit ASCII characters. If what you want to do is convert files, then this is likely easier than dealing with the byte arrays directly.

using (System.IO.StreamReader reader = new System.IO.StreamReader(fileName,
                                       Encoding.GetEncoding("iso-8859-1")))
{
    using (System.IO.StreamWriter writer = new System.IO.StreamWriter(
                                           outFileName, Encoding.UTF8))
    {
        writer.Write(reader.ReadToEnd());
    }
}

不过，如果你想自己有字节数组，它很容易与 Encoding.Convert 做的。

byte[] converted = Encoding.Convert(Encoding.GetEncoding("iso-8859-1"), 
    Encoding.UTF8, data);

不过这里要注意，这一点很重要，如果你希望走这条路，那么你应该的不的使用基于编码字符串读者如的StreamReader 为您的文件IO。 的FileStream 会更适合，因为它会读取这些文件的实际字节。

It's important to note here, however, that if you want to go down this road then you should not use an encoding-based string reader like StreamReader for your file IO. FileStream would be better suited, as it will read the actual bytes of the files.

在充分探讨这个问题兴趣，像这样的工作：

In the interest of fully exploring the issue, something like this would work:

using (System.IO.FileStream input = new System.IO.FileStream(fileName,
                                    System.IO.FileMode.Open, 
                                    System.IO.FileAccess.Read))
{
    byte[] buffer = new byte[input.Length];

    int readLength = 0;

    while (readLength < buffer.Length) 
        readLength += input.Read(buffer, readLength, buffer.Length - readLength);

    byte[] converted = Encoding.Convert(Encoding.GetEncoding("iso-8859-1"), 
                       Encoding.UTF8, buffer);

    using (System.IO.FileStream output = new System.IO.FileStream(outFileName,
                                         System.IO.FileMode.Create, 
                                         System.IO.FileAccess.Write))
    {
        output.Write(converted, 0, converted.Length);
    }
}

在这个例子中，缓存变量被充满了实际数据文件在字节[] ，所以没有转换完成。 Encoding.Convert 指定源和目标编码，然后存储在名为变量转换的字节... 转换。这随后将被写入直接输出文件

In this example, the buffer variable gets filled with the actual data in the file as a byte[], so no conversion is done. Encoding.Convert specifies a source and destination encoding, then stores the converted bytes in the variable named...converted. This is then written to the output file directly.

就像我说的，使用的第一个选项的StreamReader 和的StreamWriter 将会简单得多，如果这是你重新做，但后者的例子应该给你更多的是暗示，以什么实际发生的。

Like I said, the first option using StreamReader and StreamWriter will be much simpler if this is all you're doing, but the latter example should give you more of a hint as to what's actually going on.

这篇关于使用.NET如何ISO 8859-1转换连接包含Latin-1的重音字符为UTF-8 codeD文本文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用.NET如何ISO 8859-1转换连接包含Latin-1的重音字符为UTF-8 codeD文本文件 [英] Using .NET how to convert ISO 8859-1 encoded text files that contain Latin-1 accented characters to UTF-8

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

使用.NET如何ISO 8859-1转换连接包含Latin-1的重音字符为UTF-8 codeD文本文件 [英] Using .NET how to convert ISO 8859-1 encoded text files that contain Latin-1 accented characters to UTF-8

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭