Python HTML抓取 [英] Python HTML scraping

查看：85 发布时间：2020/6/18 19:18:08 python html regex screen-scraping html-content-extraction

本文介绍了Python HTML抓取的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这并不是真正的抓取，我只是想在网页中找到该类具有特定值的URL.例如:

It's not really scraping, I'm just trying to find the URLs in a web page where the class has a specific value. For example:

<a class="myClass" href="/url/7df028f508c4685ddf65987a0bd6f22e">

我想获取href值.有关如何执行此操作的任何想法?也许正则表达式?您可以张贴一些示例代码吗? 我猜想HTML抓取类库(例如BeautifulSoup)就此而言有点矫kill过正...

I want to get the href value. Any ideas on how to do this? Maybe regex? Could you post some example code? I'm guessing html scraping libs, such as BeautifulSoup, are a bit of overkill just for this...

非常感谢！

推荐答案

正则表达式通常是个坏主意，请尝试使用

Regex is usally a bad idea, try using BeautifulSoup

简单示例:

html = #get html
soup = BeautifulSoup(html)
links = soup.findAll('a', attrs={'class': 'myclass'})
for link in links:
    #process link

这篇关于Python HTML抓取的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python HTML抓取 [英] Python HTML scraping

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

Python HTML抓取 [英] Python HTML scraping

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭