有类似 Buffer.LastPositionOf 的东西吗?查找缓冲区中最后一次出现的字符? [英] Is there something like Buffer.LastPositionOf? Find last occurence of character in buffer?
问题描述
我有一个 ReadOnlySequence
类型的缓冲区.我想通过知道每条消息以 0x1c, 0x0d
结尾(如 这里).
I have a buffer of type ReadOnlySequence<byte>
. I want to extract a subsequence (which will contain 0 - n messages) from it by knowing that each message ends with 0x1c, 0x0d
(as described here).
我知道缓冲区有一个扩展方法 PositionOf 但它
I know the buffer has an extension method PositionOf but it
返回 item
在ReadOnlySequence
.
我正在寻找一种方法,它可以返回最后一次出现的位置.我试图自己实现它,这是我目前所拥有的
and I'm looking for a method which returns me the position of the last occurrence. I tried to implement it on my own, this is what I have so far
private SequencePosition? GetLastPosition(ReadOnlySequence<byte> buffer)
{
// Do not modify the real buffer
ReadOnlySequence<byte> temporaryBuffer = buffer;
SequencePosition? lastPosition = null;
do
{
/*
Find the first occurence of the delimiters in the buffer
This only takes a byte, what to do with the delimiters? { 0x1c, 0x0d }
*/
SequencePosition? foundPosition = temporaryBuffer.PositionOf(???);
// Is there still an occurence?
if (foundPosition != null)
{
lastPosition = foundPosition;
// cut off the sequence for the next run
temporaryBuffer = temporaryBuffer.Slice(0, lastPosition.Value);
}
else
{
// this is required because otherwise this loop is infinite if lastPosition was set once
break;
}
} while (lastPosition != null);
return lastPosition;
}
我正在为此苦苦挣扎.首先,PositionOf
方法只需要一个 byte
,但是有两个分隔符,所以我必须传入一个 byte[]
.接下来我想我可以以某种方式"优化循环.
I'm struggling with it. First of all the PositionOf
method only takes a byte
but there are two delimiters so I have to pass in a byte[]
. Next I think I can optimize the loop "somehow".
您知道如何找到这些分隔符的最后一次出现吗?
Do you have any ideas how to find the last occurence of those delimiters?
推荐答案
我深入研究了这个问题,但我设法想出了一个扩展方法,我认为它可以回答您的问题:
I went down a giant rabbit hole digging into this, but I managed to come up with an extension method which I think answers your question:
using System;
using System.Buffers;
using System.Collections.Generic;
using System.Linq;
public static class ReadOnlySequenceExtensions
{
public static SequencePosition? LastPositionOf(
this ReadOnlySequence<byte> source,
byte[] delimiter)
{
if (delimiter == null)
{
throw new ArgumentNullException(nameof(delimiter));
}
if (!delimiter.Any())
{
throw new ArgumentException($"{nameof(delimiter)} is empty", nameof(delimiter));
}
var reader = new SequenceReader<byte>(source);
var delimiterToFind = new ReadOnlySpan<byte>(delimiter);
var delimiterFound = false;
// Keep reading until we've consumed all delimiters
while (reader.TryReadTo(out _, delimiterToFind, true))
{
delimiterFound = true;
}
if (!delimiterFound)
{
return null;
}
// If we got this far, we've consumed bytes up to,
// and including, the last byte of the delimiter,
// so we can use that to get the position of
// the starting byte of the delimiter
return reader.Sequence.GetPosition(reader.Consumed - delimiter.Length);
}
}
这里也有一些测试用例:
Here are some test cases too:
var cases = new List<byte[]>
{
// Case 1: Check an empty array
new byte[0],
// Case 2: Check an array with no delimiter
new byte[] { 0xf },
// Case 3: Check an array with part of the delimiter
new byte[] { 0x1c },
// Case 4: Check an array with the other part of the delimiter
new byte[] { 0x0d },
// Case 5: Check an array with the delimiter in the wrong order
new byte[] { 0x0d, 0x1c },
// Case 6: Check an array with a correct delimiter
new byte[] { 0x1c, 0x0d },
// Case 7: Check an array with a byte followed by a correct delimiter
new byte[] { 0x1, 0x1c, 0x0d },
// Case 8: Check an array with multiple correct delimiters
new byte[] { 0x1, 0x1c, 0x0d, 0x2, 0x1c, 0x0d },
// Case 9: Check an array with multiple correct delimiters
// where the delimiter isn't the last byte
new byte[] { 0x1, 0x1c, 0x0d, 0x2, 0x1c, 0x0d, 0x3 },
// Case 10: Check an array with multiple sequential bytes of a delimiter
new byte[] { 0x1, 0x1c, 0x0d, 0x2, 0x1c, 0x1c, 0x0d, 0x3 },
};
var delimiter = new byte[] { 0x1c, 0x0d };
foreach (var item in cases)
{
var source = new ReadOnlySequence<byte>(item);
var result = source.LastPositionOf(delimiter);
} // Put a breakpoint here and examine result
情况 1
到 5
都正确返回 null
.情况 6
到 10
都正确地将 SequencePosition
返回到分隔符中的第一个字节(即在这种情况下,0x1c
>).
Cases 1
to 5
all correctly return null
. Cases 6
to 10
all correctly return the SequencePosition
to the first byte in the delimiter (i.e. in this case, 0x1c
).
我还尝试创建一个迭代版本,在找到分隔符后会产生一个位置,如下所示:
I also tried to create an iterative version that would yield a position after finding a delimiter, like so:
while (reader.TryReadTo(out _, delimiterToFind, true))
{
yield return reader.Sequence.GetPosition(reader.Consumed - delimiter.Length);
}
但是 SequenceReader
和 ReadOnlySpan
不能用在迭代器块中,所以我想出了 AllPositionsOf
:
But SequenceReader<T>
and ReadOnlySpan<T>
can't be used in iterator blocks, so I came up with AllPositionsOf
instead:
public static IEnumerable<SequencePosition> AllPositionsOf(
this ReadOnlySequence<byte> source,
byte[] delimiter)
{
if (delimiter == null)
{
throw new ArgumentNullException(nameof(delimiter));
}
if (!delimiter.Any())
{
throw new ArgumentException($"{nameof(delimiter)} is empty", nameof(delimiter));
}
var reader = new SequenceReader<byte>(source);
var delimiterToFind = new ReadOnlySpan<byte>(delimiter);
var results = new List<SequencePosition>();
while (reader.TryReadTo(out _, delimiterToFind, true))
{
results.Add(reader.Sequence.GetPosition(reader.Consumed - delimiter.Length));
}
return results;
}
测试用例也可以正常工作.
The test cases work properly for that, too.
现在我已经睡了一些,并且有机会思考事情,我认为可以出于以下几个原因改善上述情况:
Now that I've had some sleep, and a chance to think about things, I think the above can be improved for a few reasons:
SequenceReader
有一个Rewind()
方法,这让我觉得SequenceReader
是为了重用而设计的莉>SequenceReader
似乎旨在让ReadOnlySequence
更容易使用一般- 在
ReadOnlySequence
上创建扩展方法,以便使用SequenceReader
从ReadOnlySequence
读取好像倒退了
SequenceReader<T>
has aRewind()
method, which makes me thinkSequenceReader<T>
is designed to be reusedSequenceReader<T>
seems to be designed to make it easier to work withReadOnlySequence<T>
s in general- Creating an extension method on
ReadOnlySequence<T>
in order to use aSequenceReader<T>
to read from aReadOnlySequence<T>
seems backwards
鉴于上述情况,我认为在可能的情况下尽量避免直接使用 ReadOnlySequence
可能更有意义,首选和重用 SequenceReader
代码> 代替.因此,考虑到这一点,这里是 LastPositionOf
的不同版本,它现在是 SequenceReader
上的扩展方法:
Given the above, I think it probably makes more sense to try to avoid working directly with ReadOnlySequence<T>
s where possible, preferring, and reusing, SequenceReader<T>
instead. So with that in mind, here's a different version of LastPositionOf
which is now an extension method on SequenceReader<T>
:
public static class SequenceReaderExtensions
{
/// <summary>
/// Finds the last occurrence of a delimiter in a given sequence.
/// </summary>
/// <param name="reader">The reader to read from.</param>
/// <param name="delimiter">The delimeter to look for.</param>
/// <param name="rewind">If true, rewinds the reader to its position prior to this method being called.</param>
/// <returns>A SequencePosition if a delimiter is found, otherwise null.</returns>
public static SequencePosition? LastPositionOf(
this ref SequenceReader<byte> reader,
byte[] delimiter,
bool rewind)
{
if (delimiter == null)
{
throw new ArgumentNullException(nameof(delimiter));
}
if (!delimiter.Any())
{
throw new ArgumentException($"{nameof(delimiter)} is empty", nameof(delimiter));
}
var delimiterToFind = new ReadOnlySpan<byte>(delimiter);
var consumed = reader.Consumed;
var delimiterFound = false;
// Keep reading until we've consumed all delimiters
while (reader.TryReadTo(out _, delimiterToFind, true))
{
delimiterFound = true;
}
if (!delimiterFound)
{
if (rewind)
{
reader.Rewind(reader.Consumed - consumed);
}
return null;
}
// If we got this far, we've consumed bytes up to,
// and including, the last byte of the delimiter,
// so we can use that to get the starting byte
// of the delimiter
var result = reader.Sequence.GetPosition(reader.Consumed - delimiter.Length);
if (rewind)
{
reader.Rewind(reader.Consumed - consumed);
}
return result;
}
}
上面的测试用例继续通过,但我们现在可以重用相同的reader
.此外,它还允许您指定是否要回退到reader
在被调用之前的原始位置.
The above test cases continue to pass for this, but we can now reuse the same reader
. In addition, it allows you to specify if you want to rewind to the original position of reader
prior to being called.
这篇关于有类似 Buffer.LastPositionOf 的东西吗?查找缓冲区中最后一次出现的字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!