蟒蛇提取HTML标签的属性没有定期EX pressions [英] python extracting HTML tag attributes without regular expressions

查看：136 发布时间：2016/8/5 19:11:53 python html-parsing beautifulsoup

本文介绍了蟒蛇提取HTML标签的属性没有定期EX pressions的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有没有使用任何方式 urlib ，的urllib2 或 BeautifulSoup 提取HTML标签属性？

Is there any way using urlib, urllib2 or BeautifulSoup to extract HTML tag attributes?

例如：

<a href="xyz" title="xyz">xyz</a>

获得 HREF = XYZ，标题= XYZ

有另外一个线程谈论使用<一个href=\"http://stackoverflow.com/questions/317053/regular-ex$p$pssion-for-extracting-tag-attributes\">regular前pressions

There is another thread talking about using regular expressions

感谢

推荐答案

您可以使用BeautifulSoup解析HTML，并为每个＆LT; A＆GT; 标签，用 tag.attrs 来读取属性：

You could use BeautifulSoup to parse the HTML, and for each <a> tag, use tag.attrs to read the attributes:

In [111]: soup = BeautifulSoup.BeautifulSoup('<a href="xyz" title="xyz">xyz</a>')

In [112]: [tag.attrs for tag in soup.findAll('a')]
Out[112]: [[(u'href', u'xyz'), (u'title', u'xyz')]]

这篇关于蟒蛇提取HTML标签的属性没有定期EX pressions的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

蟒蛇提取HTML标签的属性没有定期EX pressions [英] python extracting HTML tag attributes without regular expressions

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

蟒蛇提取HTML标签的属性没有定期EX pressions [英] python extracting HTML tag attributes without regular expressions

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭