如何从无限字节流中读取 UTF-8 字符 - C# [英] How do you read UTF-8 characters from an infinite byte stream - C#

查看:18
本文介绍了如何从无限字节流中读取 UTF-8 字符 - C#的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

通常,要从字节流中读取字符,请使用 StreamReader.在此示例中,我正在从无限流中读取以 '\r' 分隔的记录.

Normally, to read characters from a byte stream you use a StreamReader. In this example I'm reading records delimited by '\r' from an infinite stream.

using(var reader = new StreamReader(stream, Encoding.UTF8))
{
    var messageBuilder = new StringBuilder();
    var nextChar = 'x';
    while (reader.Peek() >= 0)
    {
        nextChar = (char)reader.Read()
        messageBuilder.Append(nextChar);

        if (nextChar == '\r')
        {
            ProcessBuffer(messageBuilder.ToString());
            messageBuilder.Clear();
        }
    }
}

问题是 StreamReader 有一个小的内部缓冲区,所以如果代码等待记录结束"分隔符(在这种情况下为\r"),它必须等到 StreamReader 的内部缓冲区被刷新(通常因为更多的字节已经到达).

The problem is that the StreamReader has a small internal buffer, so if the code waiting for an 'end of record' delimiter ('\r' in this case) it has to wait until the StreamReader's internal buffer is flushed (usually because more bytes have arrived).

此替代实现适用于单字节 UTF-8 字符,但在多字节字符上会失败.

This alternative implementation works for single byte UTF-8 characters, but will fail on multibyte characters.

int byteAsInt = 0;
var messageBuilder = new StringBuilder();
while ((byteAsInt = stream.ReadByte()) != -1)
{
    var nextChar = Encoding.UTF8.GetChars(new[]{(byte) byteAsInt});
    Console.Write(nextChar[0]);
    messageBuilder.Append(nextChar);

    if (nextChar[0] == '\r')
    {
        ProcessBuffer(messageBuilder.ToString());
        messageBuilder.Clear();
    }
}

如何修改此代码以使其适用于多字节字符?

How can I modify this code so that it works with multi-byte characters?

推荐答案

而不是旨在转换完整缓冲区的 Encoding.UTF8.GetChars,获取 Decoder 并重复调用其成员方法 GetChars 这将利用 Decoder 的内部缓冲区来处理从一次调用结束到下一次调用的部分多字节序列.

Rather than Encoding.UTF8.GetChars which is designed to convert complete buffers, get an instance of Decoder and repeatedly call its member method GetChars this will make use of the Decoder's internal buffer to handle partial multi-byte sequences from the end of one call to the next.

这篇关于如何从无限字节流中读取 UTF-8 字符 - C#的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆