字符串的正则表达式 [英] Regex Expression For a String

查看:82
本文介绍了字符串的正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在 python 中拆分字符串.

I want to split the string in python.

示例字符串:

这是第一幕.第一场和第二场,这是第二幕.场景 1 和场景 2 及更多

Hi this is ACT I. SCENE 1 and SCENE 2 and this is ACT II. SCENE 1 and SCENE 2 and more

进入以下列表:

['Hi this is', 'ACT I. SCENE 1', 'and', 'SCENE2', 'and this is', 'ACT II. SCENE 1',
 'and' , 'SCENE 2', 'and more']

有人可以帮我构建正则表达式吗?我构建的一个是:

Can someone help me build the regex? The one that I have built is:

(ACT [A-Z]+.\sSCENE\s[0-9]+)]?(.*)(SCENE [0-9]+)

但这不能正常工作.

推荐答案

如果我正确理解您的要求,您可以使用以下模式:

If I understand your requirements correctly, you may use the following pattern:

(?:ACT|SCENE).+?\d+|\S.*?(?=\s?(?:ACT|SCENE|$))

演示.

细分:

(?:                    # Start of a non-capturing group.
    ACT|SCENE          # Matches either 'ACT' or 'SCENE'.
)                      # Close the non-capturing group.
.+?                    # Matches one or more characters (lazy matching).
\d+                    # Matches one or more digits.
|                      # Alternation (OR).
\S                     # Matches a non-whitespace character (to trim spaces).
.*?                    # Matches zero or more characters (lazy matching).
(?=                    # Start of a positive Lookahead (i.e., followed by...).
    \s?                # An optional whitespace character (to trim spaces).
    (?:ACT|SCENE|$)    # Followed by either 'ACT' or 'SCENE' or the end of the string.
)                      # Close the Lookahead.

Python 示例:

import re

regex = r"(?:ACT|SCENE).+?\d+|\S.*?(?=\s?(?:ACT|SCENE|$))"
test_str = "Hi this is ACT I. SCENE 1 and SCENE 2 and this is ACT II. SCENE 1 and SCENE 2 and more"

list = re.findall(regex, test_str)
print(list)

输出:

['Hi this is', 'ACT I. SCENE 1', 'and', 'SCENE 2', 'and this is', 'ACT II. SCENE 1', 'and', 'SCENE 2', 'and more']

在线试用.

这篇关于字符串的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆