Python正则表达式-查找html标签之间的字符串 [英] Python Regex - find string between html tags

查看：107 发布时间：2021/5/14 20:45:19 python html regex

本文介绍了Python正则表达式-查找html标签之间的字符串的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试提取HTML标记之间的字符串.我可以看到类似的问题以前曾在堆栈溢出中被问过，但是我对python完全陌生，并且正在苦苦挣扎.

I am trying to extract the string between Html tags. I can see that similar questions have been asked on stack overflow before, but I am completely new to python and I am struggling.

如果我有

<b>Bold Stuff</b>

我想拥有一个让我烦恼的正则表达式

I want to have a regular expression that leaves me with

Bold Stuff

但是到目前为止，我所有的解决方案都给我留下了类似的东西

But all of my solutions so far have left me with things like

>Bold Stuff<

在此方面，我将不胜感激.

I would really appreciate any help with this.

我有

>.*?<

我已经看到了有关堆栈溢出的问题以及建议的解决方法

And I have seen a question on stack overflow with suggested solution

>([^<>]*)<

但是这些都不对我有用.请有人解释如何写一个正则表达式，说找到字符x和y之间的字符串，不包括x和y".

But neither of these are working for me. Please could someone explain how to write a regex that says "find me the string between characters x and y not including x and y".

感谢您的帮助

推荐答案

>>> a = '<b>Bold Stuff</b>'
>>> 
>>> import re
>>> re.findall(r'>(.+?)<', a)
['Bold Stuff']
>>> re.findall(r'>(.*?)<', a)[0] # non-greedy mode
'Bold Stuff'
>>> re.findall(r'>(.+?)<', a)[0] # or this, also is non-greedy mode
'Bold Stuff'
>>> re.findall(r'>(.*)<', a)[0] # greedy mode
'Bold Stuff'
>>>

这时，贪婪模式和非贪婪模式都可以工作.

At this point, both of greedy mode and non-greedy mode can work.

您正在使用第一个非贪婪模式.这是有关非贪婪模式和贪婪模式的示例:

You're using the first non-greedy mode. Here is an example about what about non-greedy mode and greedy mode:

>>> a = '<b>Bold <br> Stuff</b>'
>>> re.findall(r'>(.*?)<', a)[0]
'Bold '
>>> re.findall(r'>(.*)<', a)[0]
'Bold <br> Stuff'
>>>

这是关于 (...) :

(...)

匹配括号内的任何正则表达式，并指示组的开始和结束；

Matches whatever regular expression is inside the parentheses, and indicates the start and end of a group;

可以在执行匹配后检索组的内容，以后可以在字符串中使用\ number特殊序列进行匹配，如下所述.

the contents of a group can be retrieved after a match has been performed, and can be matched later in the string with the \number special sequence, described below.

要匹配文字(或)，请使用 \(或 \)，或将其括在其中字符类: [(] [)] .

To match the literals ( or ), use \( or \), or enclose them inside a character class: [(] [)].

这篇关于Python正则表达式-查找html标签之间的字符串的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python正则表达式-查找html标签之间的字符串 [英] Python Regex - find string between html tags

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

Python正则表达式-查找html标签之间的字符串 [英] Python Regex - find string between html tags

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭