正则表达式,用于匹配各种类型的编号列表 [英] Regular expression for matching a variety of types of numbered lists
问题描述
我想创建一个(PCRE)正则表达式以匹配所有常用的编号列表,并且我想分享自己的想法并收集输入信息的方式.
I'd like to create a (PCRE) regular expression to match all commonly used numbered lists, and I'd like to share my thoughts and gather input on way to do this.
我已将列表"定义为英国盎格鲁-撒克逊规范化规范集,即
I've defined 'lists' as the set of canonical Anglo-Saxon conventions, i.e.
1 2 3
1. 2. 3.
1) 2) 3)
(1) (2) (3)
1.1 1.2 1.2.1
1.1. 1.2. 1.3.
1.1) 1.2) 1.3)
(1.1) (1.2) (1.3)
字母
a b c
a. b. c.
a) b) c)
(a) (b) (c)
A B C
A. B. C.
A) B) C)
(A) (B) (C)
罗马数字
i ii iii
i. ii. iii.
i) ii) iii)
(i) (ii) (iii)
I II III
i. ii. iii.
i) ii) iii)
(i) (ii) (iii)
我想知道列表的强度,是否应该有其他编号约定,以及是否应删除其中的任何编号.
I'd like to know how strong a set of list this is, and if there are other numbering conventions that should be in there, and if any of these ought to be removed.
这是我为解决此问题而创建的正则表达式(在 Python 中):
Here's a regular expression I've created to solve this problem (in Python):
numex = r'(?:\d{1,3}'\ # 1, 2, 3
'(?:\.\d{1,3}){0,4}'\ # 1.1, 1.1.1.1
'|[A-Z]{1,2}'\ # A. B. C.
'|[ivxcl]{1,6}' # i, iii, ...
rex = re.compile(r'(\(?%s\)|%s\.?)' % numex, re.I) # re.U?
rex.match("123. Some paragraph")
我想知道此正则表达式是否足以解决此问题,以及是否还有其他替代方法(正则表达式或其他方法).
I'd like to know how adequate this regex is for this problem, and if there are other alternative (regex or otherwise) solutions.
顺便说一句,对于我的特定用例,我预计列表编号不会超过25-50.
Incidentally, for my particular use-case, I wouldn't expect list numbers of more than 25-50.
感谢您阅读.
布莱恩
推荐答案
以下是Wikified
解决方案:
numex = r"""^(?:
\d{1,3} # 1, 2, 3
(?:\.\d{1,3}){0,4} # 1.1, 1.1.1.1
| [B-H] | [J-Z] # A, B - Z caps at 26.
| [AI](?!\s) # Note: "A" and "I" can properly start non-lists
| [a-z] # a - z
| [ivxcl]{1,6} # Roman ii, etc
| [IVXCL]{1,6} # Roman IV, etc.
)
"""
rex = re.compile(r'^\s*(\(?%s\)|%s\.?)\s+(.*)'
% (numex, numex), re.X)
最欢迎添加,更改和建议.
Additions, changes and suggestions most welcome.
这篇关于正则表达式,用于匹配各种类型的编号列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!