C#部分UTF-8字节流转换 [英] C# partial UTF-8 byte stream conversion

查看:86
本文介绍了C#部分UTF-8字节流转换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写了以下简单测试:

  [测试]公共无效TestUTF8(){var c =abc☰def";var b = Encoding.UTF8.GetBytes(c);断言That(b.Length,Is.EqualTo(9));//假设您正在读取一个字节流,并获得前5个字节的部分结果var p = Encoding.UTF8.GetChars(b,0,5);Trace.WriteLine(新字符串(p));断言That(p.Length,Is.EqualTo(3));} 

Trace 输出 abc.,并且最后一个断言失败,因为 p.Length 4 .

但是,我想让 Trace 输出 abc 并最后一次断言传递,因为实际上我知道该流将具有有效的字符,并且当情况并非如此时,最后几个字节,只需将它们留在那里等待更多数据来来.

那么我该如何在C#中实现呢?

解决方案

Encoding.GetChars 并非真正针对来自流的字节而设计,在解码过程中需要跟踪某些状态因为单个字符可能跨越多个缓冲区段,所以处理.为此,您应该使用从 Encoding.GetDecoder 获得的 Decoder .但是, Decoder.Convert 实际上是低级的,允许您控制输入和输出缓冲区,并且使用起来有些困难. Decoder.GetChars 使用起来更容易一些,并且在存储两次调用之间的状态方面做着重要的工作.我们可以轻松扩展Peter Duniho的答案以获取任意缓冲区大小:

 公共静态void Main(string [] args){var c =abc☰def";var b = Encoding.UTF8.GetBytes(c);var result = DecodeFromStream(new MemoryStream(b),Encoding.UTF8,3);Console.WriteLine(result);Console.WriteLine(c == result);}私有静态字符串DecodeFromStream(Stream dataStream,Encoding编码,int bufferSize){解码器解码器= encoding.GetDecoder();StringBuilder sb = new StringBuilder();int inputByteCount;byte [] inputBuffer =新的byte [bufferSize];char [] charBuffer = new char [encoding.GetMaxCharCount(inputBuffer.Length)];而(((inputByteCount = dataStream.Read(inputBuffer,0,inputBuffer.Length))> 0){int readChars =解码器.GetChars(inputBuffer,0,inputByteCount,charBuffer,0);如果(readChars> 0)sb.Append(charBuffer,0,readChars);}返回sb.ToString();} 

I have wrote the following simple test:

[Test]
public void TestUTF8()
{
    var c = "abc☰def";
    var b = Encoding.UTF8.GetBytes(c);

    Assert.That(b.Length, Is.EqualTo(9));
    //Assuming, you are reading a byte stream and got partial result with the first 5 bytes
    var p = Encoding.UTF8.GetChars(b, 0, 5);
    Trace.WriteLine(new string(p));
    Assert.That(p.Length, Is.EqualTo(3));
}

The Trace outputs abc� and the last assert fails because p.Length is 4.

However, I wanted Trace outputs abc and the last assert passes, since in reality I know the stream will have valid chars and when it is not the case for the last few bytes, just leave them there waiting for more data to come.

So how can I achieve this in C#?

解决方案

Encoding.GetChars isn't really designed for bytes coming from a stream where some state needs to be kept track of during the decoding process because a single character might span multiple buffer segments. To do that work you should use a Decoder obtained from Encoding.GetDecoder. However, Decoder.Convert is really low-level allowing you control over both the input and output buffers and somewhat difficult to use. Decoder.GetChars is somewhat easier to use and does the important work of storing state between calls. We can easily expand on Peter Duniho's answer for arbitrary buffer size:

public static void Main(string[] args)
{
    var c = "abc☰def";
    var b = Encoding.UTF8.GetBytes(c);
    var result = DecodeFromStream(new MemoryStream(b), Encoding.UTF8, 3);
    Console.WriteLine(result);
    Console.WriteLine(c == result);
}

private static string DecodeFromStream(Stream dataStream, Encoding encoding, int bufferSize)
{
    Decoder decoder = encoding.GetDecoder();
    StringBuilder sb = new StringBuilder();
    int inputByteCount;
    byte[] inputBuffer = new byte[bufferSize];
    char[] charBuffer = new char[encoding.GetMaxCharCount(inputBuffer.Length)];

    while ((inputByteCount = dataStream.Read(inputBuffer, 0, inputBuffer.Length)) > 0)
    {                   
       int readChars = decoder.GetChars(inputBuffer, 0, inputByteCount, charBuffer, 0);
       if (readChars > 0)
           sb.Append(charBuffer, 0, readChars);
    }
    return sb.ToString();
}

这篇关于C#部分UTF-8字节流转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆