如何使用Python Regex查找所有首字母大写的单词 [英] How to find all words with first letter as upper case using Python Regex

查看:189
本文介绍了如何使用Python Regex查找所有首字母大写的单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要查找文件中所有以大写字母开头的单词,我尝试了以下代码,但返回了空字符串.

I need to find all the words in a file which start with an upper case, I tried the below code but it returns an empty string.

import os
import re

matches = []

filename = 'C://Users/Documents/romeo.txt'
with open(filename, 'r') as f:
    for line in f:
        regex = "^[A-Z]\w*$"
        matches.append(re.findall(regex, line))
print(matches)

文件:

Hi, How are You?

输出:

[Hi,How,You]

推荐答案

您可以使用单词边界代替锚点 ^ $

You can use a word boundary instead of the anchors ^ and $

\b[A-Z]\w*

正则表达式演示

请注意,如果您使用 matches.append ,则将一个项目添加到列表中,然后re.findall返回一个列表,这将为您提供列表列表.

Note that if you use matches.append, you add an item to the list and re.findall returns a list, which will give you a list of lists.

import re

matches = []
regex = r"\b[A-Z]\w*"
filename = r'C:\Users\Documents\romeo.txt'
with open(filename, 'r') as f:
    for line in f:
        matches += re.findall(regex, line)
print(matches)

输出

['Hi', 'How', 'You']

如果左侧应该有空白边界,您也可以使用

If there should be a whitespace boundary to the left, you could also use

(?<!\S)[A-Z]\w*

正则表达式演示

如果您不想使用 \ w 仅使用大写字符来匹配单词,则可以使用例如负向查找来断言仅大写字符直到单词边界

If you don't want to match words using \w with only uppercase chars, you could use for example a negative lookahead to assert not only uppercase chars till a word boundary

\b[A-Z](?![A-Z]*\b)\w*

  • \ b 防止部分匹配的单词边界
  • [A-Z] 匹配大写字符A-Z
  • (?![A-Z] * \ b)负向超前,不仅要断言大写字符,还要紧跟单词边界
  • \ w * 匹配可选字符char
    • \b A word boundary to prevent a partial match
    • [A-Z] Match an uppercase char A-Z
    • (?![A-Z]*\b) Negative lookahead, assert not only uppercase chars followed by a word boundary
    • \w* Match optional word chars
    • 正则表达式演示

      要匹配以大写字符开头且不包含任何大写字符的单词:

      To match a word that starts with an uppercase char, and does not contain any more uppercase chars:

      \b[A-Z][^\WA-Z]*\b
      

      • \ b 单词边界
      • [A-Z] 匹配大写字符A-Z
      • [^ \ WA-Z] * (可选)匹配不带字符A-Z的char单词
      • \ b 单词边界
        • \b A word boundary
        • [A-Z] Match an uppercase char A-Z
        • [^\WA-Z]* Optionally match a word char without chars A-Z
        • \b A word boundary
        • 正则表达式演示

          这篇关于如何使用Python Regex查找所有首字母大写的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆