打印 html 标签的开始 [英] print start of html tags
问题描述
我想打印出第一个具有属性的 html 标签
test
<h2>test2</h2><div id="内容"></div><p>test3</p><div class="test"></div><div id="nav"></div><p>test3</p>
例如,给定上面的html我想打印
<div id="导航">我尝试了这个,但我得到了打击结果:
="内容">="导航">
<小时>
导入重新file = open('test.html')测试 = file.read()行 = test.splitlines()b= re.findall(r'<?=.*?>',test)对于 b 中的 a:打印(一)
如何调整我的代码以获得正确的输出.
解决方案 你应该对 =
左边的任意数量的字符使用非贪婪匹配,所以:
r'<.*?=.*?>'
这将匹配一个 <
,后跟最小字符数,然后是 =
,然后是最小字符数,直到 >;
.
你所拥有的:
r'=.*?>'
表示一个可选的<
,后跟一个=
,后跟任何直到>
的字符串.由于 <
是可选的,并且只会在 就在=
之前匹配,因此您最终没有匹配到它.
I want to print out the first html tags thats has attributes
<h1>test</h1>
<h2>test2</h2>
<div id="content"></div>
<p>test3</p>
<div class="test"></div>
<div id="nav"></div>
<p>test3</p>
for instance, given the above html I want to print
<div class="content">
<div id="nav">
I try this but I get the blow result instead:
="content">
="nav">
import re
file = open('test.html')
test = file.read()
lines = test.splitlines()
b= re.findall(r'<?=.*?>',test)
for a in b:
print(a)
how to I adjust my code to get the right output.
解决方案 You should use a non-greedy match for any number of characters to the left of the =
, so:
r'<.*?=.*?>'
That will match a <
, followed by a minimum number of characters, followed by a =
, followed by the minimum number of characters until the >
.
What you had:
r'<?=.*?>'
Means an optional <
, followed by a =
, followed by any string going up to the >
. Since the <
is optional and would only match if right before the =
, you end up with no matches for it.
这篇关于打印 html 标签的开始的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文