Python正则表达式findall [英] Python regex findall

查看:58
本文介绍了Python正则表达式findall的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 Python 2.7.2 中的正则表达式从字符串中提取所有出现的标记词.或者简单地说,我想提取 [p][/p] 标签内的每一段文本.这是我的尝试:

I am trying to extract all occurrences of tagged words from a string using regex in Python 2.7.2. Or simply, I want to extract every piece of text inside the [p][/p] tags. Here is my attempt:

regex = ur"[\u005B1P\u005D.+?\u005B\u002FP\u005D]+?"
line = "President [P] Barack Obama [/P] met Microsoft founder [P] Bill Gates [/P], yesterday."
person = re.findall(pattern, line)

打印person产生['President [P]', '[/P]', '[P] Bill Gates [/P]']

正确的正则表达式是什么:['[P] Barack Obama [/P]', '[P] Bill Gates [/p]']['Barrack Obama', 'Bill Gates'].

What is the correct regex to get: ['[P] Barack Obama [/P]', '[P] Bill Gates [/p]'] or ['Barrack Obama', 'Bill Gates'].

推荐答案

import re
regex = ur"\[P\] (.+?) \[/P\]+?"
line = "President [P] Barack Obama [/P] met Microsoft founder [P] Bill Gates [/P], yesterday."
person = re.findall(regex, line)
print(person)

收益

['Barack Obama', 'Bill Gates']

<小时>

正则表达式 ur"[\u005B1P\u005D.+?\u005B\u002FP\u005D]+?" 完全一样unicode 为 u'[[1P].+?[/P]]+?' 除非更难阅读.


The regex ur"[\u005B1P\u005D.+?\u005B\u002FP\u005D]+?" is exactly the same unicode as u'[[1P].+?[/P]]+?' except harder to read.

第一个括号组 [[1P] 告诉 re 列表中的任何字符 ['[', '1', 'P'] 应该匹配,与第二个括号组 [/P] 类似].那根本不是你想要的.所以,

The first bracketed group [[1P] tells re that any of the characters in the list ['[', '1', 'P'] should match, and similarly with the second bracketed group [/P]].That's not what you want at all. So,

  • 去掉外面的方括号.(同时删除在 P 前面散落 1.)
  • 要保护 [P] 中的文字括号,请使用反斜杠:\[P\].
  • 要仅返回标签内的单词,请放置分组括号围绕 .+?.
  • Remove the outer enclosing square brackets. (Also remove the stray 1 in front of P.)
  • To protect the literal brackets in [P], escape the brackets with a backslash: \[P\].
  • To return only the words inside the tags, place grouping parentheses around .+?.

这篇关于Python正则表达式findall的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆