Beautiful Soup 解析 url 以获取另一个 urls 数据 [英] Beautiful Soup to parse url to get another urls data

查看:8
本文介绍了Beautiful Soup 解析 url 以获取另一个 urls 数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要解析一个 url 以获取链接到详细信息页面的 url 列表.然后从该页面我需要从该页面获取所有详细信息.我需要这样做,因为详细信息页面 url 不会定期增加和更改,但事件列表页面保持不变.

I need to parse a url to get a list of urls that link to a detail page. Then from that page I need to get all the details from that page. I need to do it this way because the detail page url is not regularly incremented and changes, but the event list page stays the same.

基本上:

example.com/events/
    <a href="http://example.com/events/1">Event 1</a>
    <a href="http://example.com/events/2">Event 2</a>

example.com/events/1
    ...some detail stuff I need

example.com/events/2
    ...some detail stuff I need

推荐答案

import urllib2
from BeautifulSoup import BeautifulSoup

page = urllib2.urlopen('http://yahoo.com').read()
soup = BeautifulSoup(page)
soup.prettify()
for anchor in soup.findAll('a', href=True):
    print anchor['href']

它会给你网址列表.现在您可以遍历这些 url 并解析数据.

It will give you the list of urls. Now You can iterate over those urls and parse the data.

  • inner_div = soup.findAll("div", {"id": "y-shade"})这是一个例子.您可以阅读 BeautifulSoup 教程.
  • inner_div = soup.findAll("div", {"id": "y-shade"}) This is an example. You can go through the BeautifulSoup tutorials.

这篇关于Beautiful Soup 解析 url 以获取另一个 urls 数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆