字符串的正则表达式 [英] Regex Expression For a String
本文介绍了字符串的正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想在 python 中拆分字符串.
I want to split the string in python.
示例字符串:
这是第一幕.第一场和第二场,这是第二幕.场景 1 和场景 2 及更多
Hi this is ACT I. SCENE 1 and SCENE 2 and this is ACT II. SCENE 1 and SCENE 2 and more
进入以下列表:
['Hi this is', 'ACT I. SCENE 1', 'and', 'SCENE2', 'and this is', 'ACT II. SCENE 1',
'and' , 'SCENE 2', 'and more']
有人可以帮我构建正则表达式吗?我构建的一个是:
Can someone help me build the regex? The one that I have built is:
(ACT [A-Z]+.\sSCENE\s[0-9]+)]?(.*)(SCENE [0-9]+)
但这不能正常工作.
推荐答案
如果我正确理解您的要求,您可以使用以下模式:
If I understand your requirements correctly, you may use the following pattern:
(?:ACT|SCENE).+?\d+|\S.*?(?=\s?(?:ACT|SCENE|$))
演示.
细分:
(?: # Start of a non-capturing group.
ACT|SCENE # Matches either 'ACT' or 'SCENE'.
) # Close the non-capturing group.
.+? # Matches one or more characters (lazy matching).
\d+ # Matches one or more digits.
| # Alternation (OR).
\S # Matches a non-whitespace character (to trim spaces).
.*? # Matches zero or more characters (lazy matching).
(?= # Start of a positive Lookahead (i.e., followed by...).
\s? # An optional whitespace character (to trim spaces).
(?:ACT|SCENE|$) # Followed by either 'ACT' or 'SCENE' or the end of the string.
) # Close the Lookahead.
Python 示例:
import re
regex = r"(?:ACT|SCENE).+?\d+|\S.*?(?=\s?(?:ACT|SCENE|$))"
test_str = "Hi this is ACT I. SCENE 1 and SCENE 2 and this is ACT II. SCENE 1 and SCENE 2 and more"
list = re.findall(regex, test_str)
print(list)
输出:
['Hi this is', 'ACT I. SCENE 1', 'and', 'SCENE 2', 'and this is', 'ACT II. SCENE 1', 'and', 'SCENE 2', 'and more']
在线试用.
这篇关于字符串的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文