打印 html 标签的开始 [英] print start of html tags

查看:67
本文介绍了打印 html 标签的开始的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想打印出第一个具有属性的 html 标签

 

test

<h2>test2</h2><div id="内容"></div><p>test3</p><div class="test"></div><div id="nav"></div><p>test3</p>

例如,给定上面的html我想打印

<div id="导航">

我尝试了这个,但我得到了打击结果:

="内容">="导航">

<小时>

导入重新file = open('test.html')测试 = file.read()行 = test.splitlines()b= re.findall(r'<?=.*?>',test)对于 b 中的 a:打印(一)

如何调整我的代码以获得正确的输出.

解决方案

你应该对 = 左边的任意数量的字符使用非贪婪匹配,所以:

r'<.*?=.*?>'

这将匹配一个 <,后跟最小字符数,然后是 =,然后是最小字符数,直到 >;.

你所拥有的:

r''

表示一个可选的<,后跟一个=,后跟任何直到>的字符串.由于 < 是可选的,并且只会在 就在= 之前匹配,因此您最终没有匹配到它.

I want to print out the first html tags thats has attributes

    <h1>test</h1>
    <h2>test2</h2>
    <div id="content"></div>
    <p>test3</p>
    <div class="test"></div>
    <div id="nav"></div>
    <p>test3</p>

for instance, given the above html I want to print

<div class="content">
<div id="nav">

I try this but I get the blow result instead:

="content">
="nav">


import re
file = open('test.html')
test = file.read()
lines = test.splitlines()
b= re.findall(r'<?=.*?>',test)
for a in b:
    print(a)

how to I adjust my code to get the right output.

解决方案

You should use a non-greedy match for any number of characters to the left of the =, so:

r'<.*?=.*?>'

That will match a <, followed by a minimum number of characters, followed by a =, followed by the minimum number of characters until the >.

What you had:

r'<?=.*?>'

Means an optional <, followed by a =, followed by any string going up to the >. Since the < is optional and would only match if right before the =, you end up with no matches for it.

这篇关于打印 html 标签的开始的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆