Python HTML抓取 [英] Python HTML scraping

查看:85
本文介绍了Python HTML抓取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这并不是真正的抓取,我只是想在网页中找到该类具有特定值的URL.例如:

It's not really scraping, I'm just trying to find the URLs in a web page where the class has a specific value. For example:

<a class="myClass" href="/url/7df028f508c4685ddf65987a0bd6f22e">

我想获取href值.有关如何执行此操作的任何想法?也许正则表达式?您可以张贴一些示例代码吗? 我猜想HTML抓取类库(例如BeautifulSoup)就此而言有点矫kill过正...

I want to get the href value. Any ideas on how to do this? Maybe regex? Could you post some example code? I'm guessing html scraping libs, such as BeautifulSoup, are a bit of overkill just for this...

非常感谢!

推荐答案

正则表达式通常是个坏主意,请尝试使用

Regex is usally a bad idea, try using BeautifulSoup

简单示例:

html = #get html
soup = BeautifulSoup(html)
links = soup.findAll('a', attrs={'class': 'myclass'})
for link in links:
    #process link

这篇关于Python HTML抓取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆