BeautifulSoup刮的街道地址。 [英] BeautifulSoup to scrape street address

查看：167 发布时间：2016/8/5 19:01:47 python beautifulsoup scrape

本文介绍了BeautifulSoup刮的街道地址。的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用的是code。在遥远的底部得到的网站链接和清真寺名称即可。不过，我想也弄的名称和街景地址即可。请大家帮我卡住了。

目前我正在以下

网络链接：

 ＆LT; DIV CLASS =subtitleLink＆GT;＆LT; A HREF =http://www.salatomatic.com/d/Tempe+5313+Masjid-Al-Hijrah ＆GT;

和清真寺名称

 ＆LT; B＆GT;清真寺铝伊历＆LT; / B＆GT;

但想获得以下;

面额

 ＆LT; B＆GT;面额：LT; / B＆GT;逊尼派（繁体）

和街道地址

 ＆LT; BR＆GT; 45站街（悉尼）及NBSP;＆安培; NBSP;

下面code擦伤以下

 ＆LT; TD宽度= 25 GT;＆LT; A HREF =http://www.salatomatic.com/d/Tempe+5313+Masjid-Al-Hijrah＆GT;＆LT ; IMG SRC =HTTP：//www.halalfire.com/images/en/photo_small.jpg'ALT =清真寺铝伊历称号='清真寺铝伊历BORDER = 0 WIDTH = 48 HEIGHT = 36＆GT;＆LT ; / A＆GT;＆下; / A＆GT;＆下; / TD＆GT;＆下; TD宽度= 10 -10;＆下; IMG SRC =http://www.salatomatic.com/images/spacer.gif宽度= 10边界= 0＆GT; ＆所述; / TD＆GT;＆下; TD NOWRAP＆GT;＆下;股利类=subtitleLink＆GT;＆下; A HREF =http://www.salatomatic.com/d/Tempe+5313+Masjid-Al-Hijrah＆GT; ＆LT; b＆GT;清真寺铝伊历＆LT; / b＆GT;＆LT; / A＆GT;＆安培; NBSP;＆安培; NBSP; ＆LT; / DIV＆GT;＆LT; DIV CLASS =tinyLink＆GT;＆LT; B＆GT;面额：LT; / B＆GT;逊尼派（繁体）LT; BR＆GT; 45站街（悉尼）及NBSP;＆安培; NBSP;＆LT; / DIV＆GT;＆LT; / TD＆GT;＆LT; TD ALIGN =右VALIGN =中心＆GT;＆LT; DIV CLASS =tinyLink ＆GT;＆LT; / DIV＆GT;＆LT; / TD＆GT;

code：

 从BS4进口BeautifulSoup
进口的urllib2为url1 =http://www.salatomatic.com/c/Sydney+168
内容1 = urllib2.urlopen（URL1）.read（）
汤= BeautifulSoup（内容1）结果= soup.findAll（格，{级：subtitleLink}）
对于结果的结果：
    BR = result.find（'B'）
    一个= result.find（'a'）的
    CURRENTURL = a.get（'href属性）
    如果不是currenturl.startswith（HTTP）：
        CURRENTURL =http://www.salatomatic.com+ CURRENTURL
        打印CURRENTURL
    ELIF currenturl.startswith（HTTP）：
        打印a.get（'href属性）
    POS = br.get_text（）
    打印POS

解决方案

您可以检查下一＆LT; DIV＆GT; 与元素类与价值属性 tinyLink ，并且包含一个＆LT; b＆GT; 和＆LT; BR＆GT; 标签并提取其字符串：

  ...
打印POS
DIV = result.find_next_sibling（'格'，ATTRS = {类：tinyLink}）
如果DIV和div.b和div.br：
    打印（div.b.next_sibling.string）
    打印（div.br.next_sibling.string）

I am using the code at the far bottom to get weblink, and the Masjid name. however I would like to also get denomination and street address. please help I am stuck.

Currently I am getting the following

Weblink:

<div class="subtitleLink"><a href="http://www.salatomatic.com/d/Tempe+5313+Masjid-Al-Hijrah">

and Masjid name

<b>Masjid Al-Hijrah</b>

But would like to get the below;

Denomination

<b>Denomination:</b> Sunni (Traditional)

and street address

<br>45 Station Street (Sydney)&nbsp;&nbsp;

The below code scrapes the following

<td width=25><a href="http://www.salatomatic.com/d/Tempe+5313+Masjid-Al-Hijrah"><img src='http://www.halalfire.com/images/en/photo_small.jpg' alt='Masjid Al-Hijrah' title='Masjid Al-Hijrah' border=0 width=48 height=36></a></a></td><td width=10><img src="http://www.salatomatic.com/images/spacer.gif" width=10 border=0></td><td nowrap><div class="subtitleLink"><a href="http://www.salatomatic.com/d/Tempe+5313+Masjid-Al-Hijrah"><b>Masjid Al-Hijrah</b></a>&nbsp;&nbsp; </div><div class="tinyLink"><b>Denomination:</b> Sunni (Traditional)<br>45 Station Street (Sydney)&nbsp;&nbsp;</div></td><td align=right valign=center><div class="tinyLink"></div></td>

CODE:

from bs4 import BeautifulSoup
import urllib2

url1 = "http://www.salatomatic.com/c/Sydney+168"
content1 = urllib2.urlopen(url1).read()
soup = BeautifulSoup(content1) 

results = soup.findAll("div", {"class" : "subtitleLink"})
for result in results :
    br = result.find('b')
    a = result.find('a')
    currenturl =  a.get('href')
    if not currenturl.startswith("http"):
        currenturl = "http://www.salatomatic.com" + currenturl
        print currenturl
    elif currenturl.startswith("http"):
        print a.get('href')
    pos = br.get_text()
    print pos

解决方案

You can check next <div> element with a class attribute with value tinyLink and that contains either a <b> and a <br> tags and extract their strings:

...
print pos 
div = result.find_next_sibling('div', attrs={"class": "tinyLink"})
if div and div.b and div.br:
    print(div.b.next_sibling.string)
    print(div.br.next_sibling.string)

这篇关于BeautifulSoup刮的街道地址。的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

BeautifulSoup刮的街道地址。 [英] BeautifulSoup to scrape street address

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

BeautifulSoup刮的街道地址。 [英] BeautifulSoup to scrape street address

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭