当我运行它时,BeautifulSoup返回[] [英] BeautifulSoup returning [] when I run it

查看:86
本文介绍了当我运行它时,BeautifulSoup返回[]的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Beautiful汤和python从网站上检索天气数据.

I am using Beautiful soup with python to retrieve weather data from a website.

这是网站的外观:

<channel>
<title>2 Hour Forecast</title>
<source>Meteorological Services Singapore</source>
<description>2 Hour Forecast</description>
<item>
<title>Nowcast Table</title>
<category>Singapore Weather Conditions</category>
<forecastIssue date="18-07-2016" time="03:30 PM"/>
<validTime>3.30 pm to 5.30 pm</validTime>
<weatherForecast>
<area forecast="TL" lat="1.37500000" lon="103.83900000" name="Ang Mo Kio"/>
<area forecast="SH" lat="1.32100000" lon="103.92400000" name="Bedok"/>
<area forecast="TL" lat="1.35077200" lon="103.83900000" name="Bishan"/>
<area forecast="CL" lat="1.30400000" lon="103.70100000" name="Boon Lay"/>
<area forecast="CL" lat="1.35300000" lon="103.75400000" name="Bukit Batok"/>
<area forecast="CL" lat="1.27700000" lon="103.81900000" name="Bukit Merah"/>` 
<channel>

我想检索介于 validTime

检查页面中的元素后,我发现<span>元素内的类=文本"中为3.30 pm至5.30 pm:

After inspecting elements from the page, I found that 3.30 pm to 5.30 pm is in the "class = Text" within the <span> element:

基于网站,这是我的python代码:

Based on the webiste, here are my python codes:

import requests
from bs4 import BeautifulSoup

url = "http://www.nea.gov.sg/api/WebAPI/?dataset=2hr_nowcast&keyref=<keyrefnumber>"

r = requests.get(url)

soup = BeautifulSoup(r.content, "html.parser")

g_data = soup.find_all("span", {"class": "text"})

print g_data

# to print out the file in 3.30pm to 5:30pm to an XML file
outfile = open('C:\scripts\idk.xml','w')

当我在CMD中运行python代码时,我得到的只是[].

When I run my python codes in CMD, all I got was [].

推荐答案

新加坡的主API页面NEA网站清楚地表明,您得到的响应是一个XML文档:

The main API page on the Singapore NEA site shows clearly that the response you get is an XML document:

2小时即时广播
数据描述:接下来2小时的天气预报
最新API更新:2016年3月1日
频率每小时一次
文件类型:XML

2-hour Nowcast
Data Description: Weather forecast for next 2 hours
Last API Update: 1-Mar-2016
Frequency Hourly
File Type: XML

您正在查看Chrome中数据的HTML表示形式; Chrome浏览器对XML进行了改造,使其能够以某种方式呈现,但是您的Python代码仍在直接访问XML. PDF文档和您的自己的问题显示实际的XML内容,解析那些.

You are looking at a HTML representation of the data in Chrome; Chrome transformed the XML to make it presentable in some way, but your Python code is still accessing the XML directly. The PDF documentation and your own question show the actual XML contents, parse those.

如果要通过XML使用BeautifulSoup,请确保已安装 lxml项目,并使用'xml'解析器类型.然后只需访问validTime元素的文本内容:

If you want to use BeautifulSoup with XML, make sure you have the lxml project installed and use the 'xml' parser type. Then simply access the text content of the validTime element:

soup = BeautifulSoup(r.content, "xml")
valid_time = soup.find('validTime').string

演示:

>>> import requests
>>> from bs4 import BeautifulSoup
>>> r = requests.get('http://www.nea.gov.sg/api/WebAPI/?dataset=2hr_nowcast&keyref=<private_api_key>')
>>> soup = BeautifulSoup(r.content, "xml")
>>> soup.find('validTime').string
u'4.00 pm to 6.00 pm'

如果您要写入XML文件,则必须确保该文件正在写入有效的XML .这超出了BeautifulSoup的范围.

If you are trying to write to an XML file, you'd have to make sure it is writing valid XML however; this is outside the scope of BeautifulSoup.

或者,使用随附的 ElementTree API 默认情况下使用Python;它可以解析XML并生成新的XML.

Alternatively, use the ElementTree API that comes with Python by default; it can both parse the XML and produce new XML.

这篇关于当我运行它时,BeautifulSoup返回[]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆