如何为内联有序列表创建正则表达式? [英] How to create regex for inline ordered list?

查看：57 发布时间：2021/5/1 20:11:57 python regex django

本文介绍了如何为内联有序列表创建正则表达式?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个表单字段，其中大多数仅包含内联有序列表:

I have a form field, that most contain only inline ordered list:

1. This item may be contain characters, symbols or numbers. 2. And this item also...

以下代码不适用于用户输入验证(用户只能输入内联有序列表):

The following code not working for user input validation (users can input only inline ordered list):

definiton_re = re.compile(r'^(?:\d\.\s(?:.+?))+$')
validate_definiton = RegexValidator(definiton_re, _("Enter a valid 'definition' in format: 1. meaning #1, 2. meaning #2...etc"), 'invalid')

PS:这里我使用的是Django的 RegexValidator 类验证表单字段值的框架.

P.S.: Here i'm using RegexValidator class from Django framework to validate form field value.

推荐答案

OP的不错解决方案.进一步推广它，让我们做一些正则表达式优化/高尔夫.

Nice solution from OP. To push it further, let's do some regex optimization / golfing.

(?<!\S)\d{1,2}\.\s((?:(?!,\s\d{1,2}\.),?[^,]*)+)

这是新功能:

(?:^ | \ s)与交替之间的回溯匹配.在这里，我们改用(?<！\ S)断言我们不在非空格字符的前面.
\ d {1,2} \.\ s 不必在非捕获组内.
(.+?)(?=(?:，\ d {1,2} \.)| $)太大.我们将此位更改为:
- ( 捕获组
- (?:
- (?！否定前瞻:断言我们的立场是否:
- ，\ s \ d {1,2} \.逗号，空格字符，然后是列表索引.
- )
- ，?[^，] * 这是有趣的优化:
- - 如果有逗号，我们将其匹配.因为我们从前瞻性断言中知道该位置不会启动新的列表索引.因此，我们可以安全地假设非逗号序列的其余位(如果有)与下一个元素无关，因此我们用 * 量词将其翻转，并且没有回溯.
  - (?:^|\s) Matches with backtracking between the alternation. Here we use (?<!\S) instead, to assert that we're not in front of a non-whitespace character.
  - \d{1,2}\.\s doesn't have to be within a non-capturing group.
  - (.+?)(?=(?:, \d{1,2}\.)|$) is too bulky. We change this bit to:
    - ( Capturing group
    - (?:
    - (?! Negative lookahead: Assert that our position is NOT:
    - ,\s\d{1,2}\. A comma, whitespace character, then a list index.
    - )
    - ,?[^,]* Here's the interesting optimization:
    - - We match a comma if there is one. Because we knew from our lookahead assertion that this position does not start a new list index. Therefore, we can safely assume that the remaining bit of the non-comma sequences (if there are any) are not related to the next element, hence we roll over them with the * quantifier, and there's no backtracking.
      - 这是对(.+?)的重大改进.
      - This is a significant improvement over (.+?).
      您可以在其他答案中使用它代替正则表达式，这是
      You can use that in place of the regex in the other answer, and here's a regex demo!
      
      尽管乍一看，在解析时使用 re.split()可以更好地解决此问题:
      
      Though, at first glance, this problem is better solved with re.split() while parsing:
      
      input = '1. List item #1, 2. List item 2, 3. List item #3.'; lines = re.split('(?:^|, )\d{1,2}\. ', input); # Gives ['', 'List item #1', 'List item 2', 'List item #3.'] if lines[0] == '': lines = lines[1:]; # Throws away the first empty element from splitting. print lines;
      这是一个在线代码演示.
      
      不幸的是，对于验证，您必须遵循正则表达式匹配方法，只需在楼上编译正则表达式即可:
      
      Unfortunately, for the validation you would have to follow the regex matching approach, just compile the regex upstairs:
```
regex = re.compile(r'(?<!\S)\d{1,2}\.\s((?:(?!,\s\d{1,2}\.),?[^,]*)+)')
```
      这篇关于如何为内联有序列表创建正则表达式?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何为内联有序列表创建正则表达式? [英] How to create regex for inline ordered list?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何为内联有序列表创建正则表达式? [英] How to create regex for inline ordered list?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭