提取无序列表特定< D​​IV&GT ;: BeautifulSoup [英] Extracting unordered list for a particular <div>: BeautifulSoup

查看:202
本文介绍了提取无序列表特定< D​​IV&GT ;: BeautifulSoup的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刮这个网页需要我的Andr​​oid应用程序。我想这样做是从的href 属性提取的国家。这是相同的,因为这<一个href=\"http://stackoverflow.com/questions/16267768/beautiful-soup-extracting-href-from-html-ordered-list\">one.

I'm scraping this webpage needed for my android app. What I would like to do is to extract the countries from href attribute. This is the same as this one.

下面是我的code:

from bs4 import BeautifulSoup
import urllib2
import re

html_page = urllib2.urlopen("http://www.howtocallabroad.com/a.html")
soup = BeautifulSoup(html_page)
li = soup.select("ul > li > a")
for link in li:
    print link.get('href')

我得到的问题是,结果返回所有包括来自其他 DIV 取值标签p>

The problem i'm getting is that the result returns all a tag including from other divs

afghanistan/
albania/
algeria/
american-samoa/
andorra/
angola/
anguilla/
antigua/
argentina/
armenia/
aruba/
ascension/
australia/
austria/
azerbaijan/
codes.html  # not needed
nanp.html   # not needed
qa/         # not needed
forums/     # not needed

我想知道是什么函​​数S需要做到这一点/。我想在&LT过滤的href S; DIV ID =内容&GT; 而已。该文档可是没有多的信息。

I'd like to know on what function/s needed to accomplish this. I want to filter hrefs in <div id="content"> only. The docs doesnt have much info.

很抱歉,这是我第一次写的蟒蛇。

Sorry this is the first time i write python.

推荐答案

使用的findAll()

>>> for i in soup.find('div',{'id':'content'}).findAll('a'):
...     print i['href']
... 
afghanistan/
albania/
algeria/
american-samoa/
andorra/
angola/
anguilla/
antigua/
argentina/
armenia/
aruba/
ascension/
australia/
austria/
azerbaijan/

soup.find('格',{'ID':'内容'})是否说的话。它发现其中有内容的 ID div标签&LT; D​​IV ID =内容的方式&gt; 将匹配)

soup.find('div',{'id':'content'}) Does what it says. It finds the div tag which has an id of content (<div id="content"> would be matched).

.findAll() ...查找所有! 'A'作为参数来找到所有的一个标签。它返回每个标签的列表。

.findAll()... finds all! 'a' is used as a parameter to find all the a tags. It returns a list of each a tag.

然后,我只是打印每一个标签的的href

Then I simply print each a-tag's href.

这篇关于提取无序列表特定&LT; D​​IV&GT ;: BeautifulSoup的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆