部分正则表达式匹配 [英] Partial regular expression match

查看:66
本文介绍了部分正则表达式匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个正则表达式,用于测试输入的字符流.我想知道是否有一种方法可以将正则表达式与输入进行匹配并确定它是否是消耗整个输入缓冲区的部分匹配?IE.在正则表达式完成之前到达输入缓冲区的末尾.我希望实现决定是等待更多输入字符,还是中止操作.

I have a regular expression that I'm testing a input stream of characters. I wonder if there is a way to match the regular expression against the input and determine if it is a partial match that consumes the entire input buffer? I.e. the end of the input buffer is reached before the regexp completes. I would like the implementation to decide whether to wait for more input characters, or abort the operation.

换句话说,我需要确定哪个是真的:

In other words, I need to determine which one is true:

  1. 在匹配正则表达式之前到达输入缓冲区的末尾

  1. The end of the input buffer was reached before the regexp was matched

例如"foo" =~/^foobar/

正则表达式完全匹配

例如"foobar" =~/^foobar/

正则表达式匹配失败

例如"fuubar" =~/^foobar

输入未打包.

推荐答案

我不确定这是否是您的问题.
正则表达式要么匹配要么不匹配.并且表达式将匹配可变数量的输入.所以,不能直接确定.

I'm not sure if this is your question but.
Regular expressions either match or not. And the expression will match a variable amount of input. So, it can't be determined directly.

但是,如果您认为存在重叠的可能性,则可以使用智能缓冲方案来完成相同的事情.

However, it is possible, if you believe there is a possibility of overlap, to use a smart buffering scheme to accomplish the same thing.

有很多方法可以做到这一点.

There are many ways to do this.

一种方法是通过断言匹配所有不匹配的东西,直到你开始匹配(但不是您寻求的完整匹配).这些你可以简单地扔掉并从你的缓冲区中清除.当您找到要查找的匹配项时,清除该数据的缓冲区以及它之前的数据.

One way is to match all that does not match via assertions, up until you get the start of a match (but not the full match you seek). These you simple throw away and clear from your buffer. When you get a match you seek, clear the buffer of that data and data before it.

示例:/()|([^<]*)/ 您丢弃/从缓冲区中清除的部分位于第 2 组捕获缓冲区中.

Example: /(<function.*?>)|([^<]*)/ The part you throw away/clear from the buffer is in group 2 capture buffer.

另一种方法是,如果您匹配有限长度的字符串,如果您不匹配任何缓冲区,您可以安全地丢弃从缓冲区开头到缓冲区末尾减去您正在搜索的有限字符串长度的所有内容.

Another way is if you are matching finite length strings, if you don't match anything in the buffer, you can safely throw away all from the beginning of the buffer to the end of the buffer minus the length of the finite string you are searching for.

示例:您的缓冲区大小为 64k.您正在搜索长度为 10 的字符串.在缓冲区中找不到它.您可以安全地清除 (64k - 10) 个字节,保留最后 10 个字节.然后将 (64k-10) 字节附加到缓冲区的末尾.当然,您只需要一个大小为 10 字节的缓冲区,不断删除/添加 1 个字符,但更大的缓冲区更多高效,您可以使用阈值重新加载更多数据.

Example: Your buffer is 64k in size. You are searching for a string of length 10. It was not found in the buffer. You can safely clear (64k - 10) bytes, retaining the last 10 bytes. Then append (64k-10) bytes to the end of the buffer. Of course you only need a buffer of size 10 bytes, constantly removing/adding 1 character but a larger buffer is more efficient and you could use thresholds to reload more data.

如果您可以创建一个易于收缩/扩展的缓冲区,则可以使用更多的缓冲选项.

If you can create a buffer that easily contracts/expands, more buffering options are available.

这篇关于部分正则表达式匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆