有类似 Buffer.LastPositionOf 的东西吗?查找缓冲区中最后一次出现的字符? [英] Is there something like Buffer.LastPositionOf? Find last occurence of character in buffer?

查看:20
本文介绍了有类似 Buffer.LastPositionOf 的东西吗?查找缓冲区中最后一次出现的字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 ReadOnlySequence 类型的缓冲区.我想通过知道每条消息以 0x1c, 0x0d 结尾(如 这里).

I have a buffer of type ReadOnlySequence<byte>. I want to extract a subsequence (which will contain 0 - n messages) from it by knowing that each message ends with 0x1c, 0x0d(as described here).

我知道缓冲区有一个扩展方法 PositionOf 但它

I know the buffer has an extension method PositionOf but it

返回 itemReadOnlySequence.

我正在寻找一种方法,它可以返回最后一次出现的位置.我试图自己实现它,这是我目前所拥有的

and I'm looking for a method which returns me the position of the last occurrence. I tried to implement it on my own, this is what I have so far

private SequencePosition? GetLastPosition(ReadOnlySequence<byte> buffer)
{
    // Do not modify the real buffer
    ReadOnlySequence<byte> temporaryBuffer = buffer;
    SequencePosition? lastPosition = null;

    do
    {
        /*
            Find the first occurence of the delimiters in the buffer
            This only takes a byte, what to do with the delimiters? { 0x1c, 0x0d }

        */
        SequencePosition? foundPosition = temporaryBuffer.PositionOf(???);

        // Is there still an occurence?
        if (foundPosition != null)
        {
            lastPosition = foundPosition;

            // cut off the sequence for the next run
            temporaryBuffer = temporaryBuffer.Slice(0, lastPosition.Value);
        }
        else
        {
            // this is required because otherwise this loop is infinite if lastPosition was set once
            break;
        }
    } while (lastPosition != null);

    return lastPosition;
}

我正在为此苦苦挣扎.首先,PositionOf 方法只需要一个 byte,但是有两个分隔符,所以我必须传入一个 byte[].接下来我想我可以以某种方式"优化循环.

I'm struggling with it. First of all the PositionOf method only takes a byte but there are two delimiters so I have to pass in a byte[]. Next I think I can optimize the loop "somehow".

您知道如何找到这些分隔符的最后一次出现吗?

Do you have any ideas how to find the last occurence of those delimiters?

推荐答案

我深入研究了这个问题,但我设法想出了一个扩展方法,我认为它可以回答您的问题:

I went down a giant rabbit hole digging into this, but I managed to come up with an extension method which I think answers your question:

using System;
using System.Buffers;
using System.Collections.Generic;
using System.Linq;

public static class ReadOnlySequenceExtensions
{
    public static SequencePosition? LastPositionOf(
        this ReadOnlySequence<byte> source,
        byte[] delimiter)
    {
        if (delimiter == null)
        {
            throw new ArgumentNullException(nameof(delimiter));
        }
        if (!delimiter.Any())
        {
            throw new ArgumentException($"{nameof(delimiter)} is empty", nameof(delimiter));
        }

        var reader = new SequenceReader<byte>(source);
        var delimiterToFind = new ReadOnlySpan<byte>(delimiter);

        var delimiterFound = false;
        // Keep reading until we've consumed all delimiters
        while (reader.TryReadTo(out _, delimiterToFind, true))
        {
            delimiterFound = true;
        }

        if (!delimiterFound)
        {
            return null;
        }

        // If we got this far, we've consumed bytes up to,
        // and including, the last byte of the delimiter,
        // so we can use that to get the position of 
        // the starting byte of the delimiter
        return reader.Sequence.GetPosition(reader.Consumed - delimiter.Length);
    }
}

这里也有一些测试用例:

Here are some test cases too:

var cases = new List<byte[]>
{
    // Case 1: Check an empty array
    new byte[0],
    // Case 2: Check an array with no delimiter
    new byte[] { 0xf },
    // Case 3: Check an array with part of the delimiter
    new byte[] { 0x1c },
    // Case 4: Check an array with the other part of the delimiter
    new byte[] { 0x0d },
    // Case 5: Check an array with the delimiter in the wrong order
    new byte[] { 0x0d, 0x1c },
    // Case 6: Check an array with a correct delimiter
    new byte[] { 0x1c, 0x0d },
    // Case 7: Check an array with a byte followed by a correct delimiter
    new byte[] { 0x1, 0x1c, 0x0d },
    // Case 8: Check an array with multiple correct delimiters
    new byte[] { 0x1, 0x1c, 0x0d, 0x2, 0x1c, 0x0d },
    // Case 9: Check an array with multiple correct delimiters
    // where the delimiter isn't the last byte
    new byte[] { 0x1, 0x1c, 0x0d, 0x2, 0x1c, 0x0d, 0x3 },
    // Case 10: Check an array with multiple sequential bytes of a delimiter
    new byte[] { 0x1, 0x1c, 0x0d, 0x2, 0x1c, 0x1c, 0x0d, 0x3 },
};

var delimiter = new byte[] { 0x1c, 0x0d };
foreach (var item in cases)
{
    var source = new ReadOnlySequence<byte>(item);
    var result = source.LastPositionOf(delimiter);
} // Put a breakpoint here and examine result

情况 15 都正确返回 null.情况 610 都正确地将 SequencePosition 返回到分隔符中的第一个字节(即在这种情况下,0x1c>).

Cases 1 to 5 all correctly return null. Cases 6 to 10 all correctly return the SequencePosition to the first byte in the delimiter (i.e. in this case, 0x1c).

我还尝试创建一个迭代版本,在找到分隔符后会产生一个位置,如下所示:

I also tried to create an iterative version that would yield a position after finding a delimiter, like so:

while (reader.TryReadTo(out _, delimiterToFind, true))
{
    yield return reader.Sequence.GetPosition(reader.Consumed - delimiter.Length);
}

但是 SequenceReaderReadOnlySpan 不能用在迭代器块中,所以我想出了 AllPositionsOf:

But SequenceReader<T> and ReadOnlySpan<T> can't be used in iterator blocks, so I came up with AllPositionsOf instead:

public static IEnumerable<SequencePosition> AllPositionsOf(
    this ReadOnlySequence<byte> source,
    byte[] delimiter)
{
    if (delimiter == null)
    {
        throw new ArgumentNullException(nameof(delimiter));
    }
    if (!delimiter.Any())
    {
        throw new ArgumentException($"{nameof(delimiter)} is empty", nameof(delimiter));
    }

    var reader = new SequenceReader<byte>(source);
    var delimiterToFind = new ReadOnlySpan<byte>(delimiter);

    var results = new List<SequencePosition>();
    while (reader.TryReadTo(out _, delimiterToFind, true))
    {
        results.Add(reader.Sequence.GetPosition(reader.Consumed - delimiter.Length));
    }

    return results;
}

测试用例也可以正常工作.

The test cases work properly for that, too.

现在我已经睡了一些,并且有机会思考事情,我认为可以出于以下几个原因改善上述情况:

Now that I've had some sleep, and a chance to think about things, I think the above can be improved for a few reasons:

  1. SequenceReader 有一个 Rewind() 方法,这让我觉得 SequenceReader 是为了重用而设计的莉>
  2. SequenceReader 似乎旨在让 ReadOnlySequence 更容易使用一般
  3. ReadOnlySequence 上创建扩展方法,以便使用 SequenceReaderReadOnlySequence 读取好像倒退了
  1. SequenceReader<T> has a Rewind() method, which makes me think SequenceReader<T> is designed to be reused
  2. SequenceReader<T> seems to be designed to make it easier to work with ReadOnlySequence<T>s in general
  3. Creating an extension method on ReadOnlySequence<T> in order to use a SequenceReader<T> to read from a ReadOnlySequence<T> seems backwards

鉴于上述情况,我认为在可能的情况下尽量避免直接使用 ReadOnlySequence 可能更有意义,首选和重用 SequenceReader代码> 代替.因此,考虑到这一点,这里是 LastPositionOf 的不同版本,它现在是 SequenceReader 上的扩展方法:

Given the above, I think it probably makes more sense to try to avoid working directly with ReadOnlySequence<T>s where possible, preferring, and reusing, SequenceReader<T> instead. So with that in mind, here's a different version of LastPositionOf which is now an extension method on SequenceReader<T>:

public static class SequenceReaderExtensions
{
    /// <summary>
    /// Finds the last occurrence of a delimiter in a given sequence.
    /// </summary>
    /// <param name="reader">The reader to read from.</param>
    /// <param name="delimiter">The delimeter to look for.</param>
    /// <param name="rewind">If true, rewinds the reader to its position prior to this method being called.</param>
    /// <returns>A SequencePosition if a delimiter is found, otherwise null.</returns>
    public static SequencePosition? LastPositionOf(
        this ref SequenceReader<byte> reader,
        byte[] delimiter,
        bool rewind)
    {
        if (delimiter == null)
        {
            throw new ArgumentNullException(nameof(delimiter));
        }
        if (!delimiter.Any())
        {
            throw new ArgumentException($"{nameof(delimiter)} is empty", nameof(delimiter));
        }

        var delimiterToFind = new ReadOnlySpan<byte>(delimiter);
        var consumed = reader.Consumed;

        var delimiterFound = false;
        // Keep reading until we've consumed all delimiters
        while (reader.TryReadTo(out _, delimiterToFind, true))
        {
            delimiterFound = true;
        }

        if (!delimiterFound)
        {
            if (rewind)
            {
                reader.Rewind(reader.Consumed - consumed);
            }

            return null;
        }

        // If we got this far, we've consumed bytes up to,
        // and including, the last byte of the delimiter,
        // so we can use that to get the starting byte
        // of the delimiter
        var result = reader.Sequence.GetPosition(reader.Consumed - delimiter.Length);
        if (rewind)
        {
            reader.Rewind(reader.Consumed - consumed);
        }

        return result;
    }
}

上面的测试用例继续通过,但我们现在可以重用相同的reader.此外,它还允许您指定是否要回退到reader 在被调用之前的原始位置.

The above test cases continue to pass for this, but we can now reuse the same reader. In addition, it allows you to specify if you want to rewind to the original position of reader prior to being called.

这篇关于有类似 Buffer.LastPositionOf 的东西吗?查找缓冲区中最后一次出现的字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆