使用RFC8141匹配URN的正则表达式 [英] Regex which matches URN by rfc8141
问题描述
我正在努力寻找一个正则表达式,该正则表达式可以匹配 rfc8141 中所述的URN.我已经尝试过这个:
I am struggling to find a Regex which could match a URN as described in rfc8141. I have tried this one:
\ A(?i:urn :( ?! urn:)(?&nid> [a-z0-9] [a-z0-9-] {1,31}):(?< nss>(?:[a-z0-9()+,-.:= @; $ _!*'] |%[0-9a-f] {2})+))\ z
但是这只匹配URN的第一部分,没有任何组成部分.
but this one only matches the first part of the URN without the components.
例如,假设我们有相应的URN: urn:example:a123,0%7C00〜& z456/789?+ abc?= xyz#12/3
我们应该匹配以下内容组:
For example lets say we have the corresponding URN: urn:example:a123,0%7C00~&z456/789?+abc?=xyz#12/3
We should match the following groups:
- NID-示例
- NSS-a123,0%7C00〜& z456/789(从最后一个':'tll匹配'?+'或'?='或'#'
- r-component-abc(从'?+'直到'?='或'#'')
- f组件-12/3(从#"到结尾)
推荐答案
我还没有阅读所有规范,因此可能还有其他规则需要实现,但它应该为您提供可选组件的途径:>
I haven't read all the specifications, so there may be other rules to implement, but it should put you on the way for the optional components:
\A(?i:urn:(?!urn:)(?<nid>[a-z0-9][a-z0-9-]{1,31}):(?<nss>(?:[-a-z0-9()+,.:=@;$_!*'&~\/]|%[0-9a-f]{2})+)(?:\?\+(?<rcomponent>.*?))?(?:\?=(?<qcomponent>.*?))?(?:#(?<fcomponent>.*?))?)\z
说明:
-
(?< nss>(?:[-a-z0-9()+ ,.:= @; $ _!*'&〜\/] |%[0-9a-f] {2})+)
:-
已移至列表的开头,以允许的字符形式考虑,否则表示范围为,
到.
".字符&
,〜
和/
(必须用"\"转义)也已添加到列表中,否则与您的示例不符. - 可选组件:
(?:\?\ +(?< rcomponent>.*?))?
:位于可选的非捕获组(?:)?
以防止捕获标识符(?+
,?=
和#
部分).字符?
和+
必须用"\"转义.将捕获任何内容(.
),但以惰性模式(*?
),否则找到的第一个组件将捕获所有内容,直到字符串末尾.
(?<nss>(?:[-a-z0-9()+,.:=@;$_!*'&~\/]|%[0-9a-f]{2})+)
: The-
has been moved to the beginning of the list to be considered in the allowed chars, or else it means "range from,
to.
". The characters&
,~
and/
(has to be escaped with "\") have also been added to the list, or else it won't match your example.- optional components:
(?:\?\+(?<rcomponent>.*?))?
: inside an optional non-capturing group(?:)?
to prevent capturing the identifier (the?+
,?=
and#
part). The chars?
and+
have to be escaped with "\". Will capture anything (.
) but in lazy mode (*?
) or else the first component found would capture everything until the end of the string.
请参见 Regex101
希望有帮助
这篇关于使用RFC8141匹配URN的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!