使用RFC8141匹配URN的正则表达式 [英] Regex which matches URN by rfc8141

查看:47
本文介绍了使用RFC8141匹配URN的正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力寻找一个正则表达式,该正则表达式可以匹配 rfc8141 中所述的URN.我已经尝试过这个:

I am struggling to find a Regex which could match a URN as described in rfc8141. I have tried this one:

\ A(?i:urn :( ?! urn:)(?&nid> [a-z0-9] [a-z0-9-] {1,31}):(?< nss>(?:[a-z0-9()+,-.:= @; $ _!*'] |%[0-9a-f] {2})+))\ z

但是这只匹配URN的第一部分,没有任何组成部分.

but this one only matches the first part of the URN without the components.

例如,假设我们有相应的URN: urn:example:a123,0%7C00〜& z456/789?+ abc?= xyz#12/3 我们应该匹配以下内容组:

For example lets say we have the corresponding URN: urn:example:a123,0%7C00~&z456/789?+abc?=xyz#12/3 We should match the following groups:

  • NID-示例
  • NSS-a123,0%7C00〜& z456/789(从最后一个':'tl​​l匹配'?+'或'?='或'#'
  • r-component-abc(从'?+'直到'?='或'#'')
  • f组件-12/3(从#"到结尾)

推荐答案

我还没有阅读所有规范,因此可能还有其他规则需要实现,但它应该为您提供可选组件的途径:

I haven't read all the specifications, so there may be other rules to implement, but it should put you on the way for the optional components:

\A(?i:urn:(?!urn:)(?<nid>[a-z0-9][a-z0-9-]{1,31}):(?<nss>(?:[-a-z0-9()+,.:=@;$_!*'&~\/]|%[0-9a-f]{2})+)(?:\?\+(?<rcomponent>.*?))?(?:\?=(?<qcomponent>.*?))?(?:#(?<fcomponent>.*?))?)\z

说明:

  • (?< nss>(?:[-a-z0-9()+ ,.:= @; $ _!*'&〜\/] |%[0-9a-f] {2})+):-已移至列表的开头,以允许的字符形式考虑,否则表示范围为.".字符& /(必须用"\"转义)也已添加到列表中,否则与您的示例不符.
  • 可选组件:(?:\?\ +(?< rcomponent>.*?))?:位于可选的非捕获组(?:)?以防止捕获标识符(?+ ?= #部分).字符? + 必须用"\"转义.将捕获任何内容(.),但以惰性模式( *?),否则找到的第一个组件将捕获所有内容,直到字符串末尾.
  • (?<nss>(?:[-a-z0-9()+,.:=@;$_!*'&~\/]|%[0-9a-f]{2})+) : The - has been moved to the beginning of the list to be considered in the allowed chars, or else it means "range from , to .". The characters &, ~ and / (has to be escaped with "\") have also been added to the list, or else it won't match your example.
  • optional components: (?:\?\+(?<rcomponent>.*?))? : inside an optional non-capturing group (?:)? to prevent capturing the identifier (the ?+, ?= and # part). The chars ? and + have to be escaped with "\". Will capture anything (.) but in lazy mode (*?) or else the first component found would capture everything until the end of the string.

请参见 Regex101

希望有帮助

这篇关于使用RFC8141匹配URN的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆