使用正则表达式获取连续的大写单词 [英] Get consecutive capitalized words using regex

查看:44
本文介绍了使用正则表达式获取连续的大写单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的正则表达式无法捕获连续的大写单词.这是我希望正则表达式捕获的内容:

I am having trouble with my regex for capturing consecutive capitalized words. Here is what I want the regex to capture:

"said Polly Pocket and the toys" -> Polly Pocket

这是我使用的正则表达式:

Here is the regex I am using:

re.findall('said ([A-Z][\w-]*(\s+[A-Z][\w-]*)+)', article)

它返回以下内容:

[('Polly Pocket', ' Pocket')]

我希望它返回:

['Polly Pocket']

推荐答案

使用积极的前瞻性:

([A-Z][a-z]+(?=\s[A-Z])(?:\s[A-Z][a-z]+)+)

断言要接受的当前单词后面需要跟另一个带有大写字母的单词.分解:

Assert that the current word, to be accepted, needs to be followed by another word with a capital letter in it. Broken down:

(                # begin capture
  [A-Z]            # one uppercase letter  \ First Word
  [a-z]+           # 1+ lowercase letters  /
  (?=\s[A-Z])      # must have a space and uppercase letter following it
  (?:                # non-capturing group
    \s               # space
    [A-Z]            # uppercase letter   \ Additional Word(s)
    [a-z]+           # lowercase letter   /
  )+              # group can be repeated (more words)
)               #end capture

这篇关于使用正则表达式获取连续的大写单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆