Python:如何为内联有序列表创建正则表达式? [英] Python: How to create regex for inline ordered list?

查看:132
本文介绍了Python:如何为内联有序列表创建正则表达式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个表单域,大多数只包含内联有序列表:

  1。此项目可能包含字符,符号或数字。 2.此项目还... 

以下代码无法用于用户输入验证(用户可以仅输入内联有序列表):

  definiton_re = re.compile(r'^(?: \d\.\\ \\ s(?:。+?))+ $')
validate_definiton = RegexValidator(definiton_re,_(输入有效的定义,格式为:1.意思是#1,2.意思是#2 ... etc),'invalid')

PS:这里我使用 RegexValidator 类,从Django框架验证表单字段值。

解决方案

OP的好解决方案为了进一步推动,让我们做一些正则表达式优化/高尔夫。

 (?<!\S)\d {1,2} \\((?(??,\s\d {1,2} \。),?[^,] *)+)
/ pre>

这是什么新的:




  • (?:^ | \s)与回溯匹配在交替之间。这里我们使用(?<!\S)来表示我们不在非空格字符之前。

  • \d {1,2} \.\s 不必在非捕获组内。

  • (。+?)(?=(?:,\d {1,2} \。)| $)太笨重了。我们将此位更改为:


    • 获取组

    •   (?:

    •   (? 负面前瞻:断言我们的立场是

    •     ,\ s \d {1,2} \。逗号,空格字符,然后列表索引。

    •   

    •    ,?[^,] * 这是一个有趣的优化: / li>

      • 如果有一个,我们匹配一个逗号,因为我们从前瞻性断言知道这个位置没有启动新的列表索引,所以我们可以安全地假设非逗号序列的剩余位(如果有的话)与下一个元素无关,因此我们用 * 量词翻转它们,没有回溯。



      • 这是一个比 (。+?)


    •   )+ 继续重复该组,直到否定前瞻断言失败。





您可以使用它代替正则表达式在其他答案,这里是一个正则表达式演示






乍看起来,这个问题最好用 re.split() while parsing:

  input ='1。列出项目#1,2.列出项目2,3.列出项目#3。'; 
lines = re.split('(?:^ |,)\d {1,2} \。',input);
#给出['','List item#1','List item 2','List item#3。]
if lines [0] =='':
lines = lines [1:];
#将第一个空的元素从分割中删除。
打印行;

这是一个

p>

  regex = re.compile(r'(?<!\S)\d {1,2} \。 \s((?:(?!,\s\d {1,2} \。),?[^,] *)+)')
/ pre>

I have a form field, that most contain only inline ordered list:

1. This item may be contain characters, symbols or numbers. 2. And this item also...

The following code not working for user input validation (users can input only inline ordered list):

definiton_re = re.compile(r'^(?:\d\.\s(?:.+?))+$')
validate_definiton = RegexValidator(definiton_re, _("Enter a valid 'definition' in format: 1. meaning #1, 2. meaning #2...etc"), 'invalid')

P.S.: Here i'm using RegexValidator class from Django framework to validate form field value.

解决方案

Nice solution from OP. To push it further, let's do some regex optimization / golfing.

(?<!\S)\d{1,2}\.\s((?:(?!,\s\d{1,2}\.),?[^,]*)+)

Here's what's new:

  • (?:^|\s) Matches with backtracking between the alternation. Here we use (?<!\S) instead, to assert that we're not in front of a non-whitespace character.
  • \d{1,2}\.\s doesn't have to be within a non-capturing group.
  • (.+?)(?=(?:, \d{1,2}\.)|$) is too bulky. We change this bit to:
    • ( Capturing group
    •   (?:
    •     (?! Negative lookahead: Assert that our position is NOT:
    •       ,\s\d{1,2}\. A comma, whitespace character, then a list index.
    •     )
    •     ,?[^,]* Here's the interesting optimization:
      • We match a comma if there is one. Because we knew from our lookahead assertion that this position does not start a new list index. Therefore, we can safely assume that the remaining bit of the non-comma sequences (if there are any) are not related to the next element, hence we roll over them with the * quantifier, and there's no backtracking.
      • This is a significant improvement over (.+?).
    •   )+ Keep repeating the group until the negative lookahead assertion fails.
    • )

You can use that in place of the regex in the other answer, and here's a regex demo!


Though, at first glance, this problem is better solved with re.split() while parsing:

input = '1. List item #1, 2. List item 2, 3. List item #3.';
lines = re.split('(?:^|, )\d{1,2}\. ', input);
 # Gives ['', 'List item #1', 'List item 2', 'List item #3.']
if lines[0] == '':
  lines = lines[1:];
 # Throws away the first empty element from splitting.
print lines;

Here is an online code demo.

Unfortunately, for the validation you would have to follow the regex matching approach, just compile the regex upstairs:

regex = re.compile(r'(?<!\S)\d{1,2}\.\s((?:(?!,\s\d{1,2}\.),?[^,]*)+)')

这篇关于Python:如何为内联有序列表创建正则表达式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆