美丽的汤解析XML [英] Beautiful soup parsing XML
问题描述
我有这样的数据结构。
<photo id="123" owner="12345" secret="xx" server="12" farm="4" title="109L_0195"
ispublic="1" isfriend="0" isfamily="0" views="0" tags="military czechrepublic kmk koně
humpolec všestrannost humpoec vysocinaregion" latitude="49.550933" longitude="15.36652"
accuracy="16" context="0" place_id="tg5cqdpWW7q18rE" woeid="790349" geo_is_family="0"
geo_is_friend="0" geo_is_contact="0" geo_is_public="1">
<description>
Kvalifikační kolo KMK - všestrannost 18.7.2014 - Humpolec
</description>
</photo>
<photo id="123" owner="06" secret="xx" server="12" farm="4"
title="Ytterligare en bild ifrån inspelningen av Johan Stjerquist's video: Nudist
Javisst." ispublic="1" isfriend="0" isfamily="0" views="0" tags="square squareformat
iphoneography instagramapp uploaded:by=instagram" latitude="56.171184"
longitude="14.741144" accuracy="16" context="0" place_id="u4MzsN9ZW7KnPWo"
woeid="898740" geo_is_family="0" geo_is_friend="0" geo_is_contact="0" geo_is_public="1">
<description/>
</photo>
其有关通过Flickr的API访问的照片信息的和平。
我想提取以下信息:
ID
标题
标签
经度
纬度
Its a peace of information about a photo accessed through the Flickr API. I want to extract the following information: id title tags longitude latitude
我试图通过这个来完成。
which I tried to accomplish through this.
url = "https://api.flickr.com/services/rest/?method=flickr.photos.search&api_key=5....b&per_page=250&accuracy=1&has_geo=1&extras=geo,tags,views,description"
soup = BeautifulSoup(urlopen(url))
for data in soup.find_all('photo'):
print (data.attrs['id' , 'title' , 'tags' , 'latitude' , 'longitude' , 'accuracy'])
这没有奏效。在 ATTRS
只接受一个参数。综观 BeautifulSoup
的文档,它看起来像没有其他工具,它可以帮助我得到所有的信息,或者是我错了(的 http://www.crummy.com/software/BeautifulSoup/bs4/doc/ )?我试图替换 ATTRS
到 P
,但没有工作也没有。
That did not work. The attrs
accepts only one argument. Looking at the documentation of BeautifulSoup
it looks like there is no other tool which could help me getting all the information or am I mistaken (http://www.crummy.com/software/BeautifulSoup/bs4/doc/)? I tried to substitute attrs
through p
but that did not work neither.
该命令,我可以使用任何想法?
Any ideas which command I could use?
推荐答案
ATTRS
是一个字典,你可以使用字典COM prehension只得到特定按键:
Since attrs
is a dictionary, you can get only specific keys using dictionary comprehension:
keys = {'id', 'title', 'tags', 'latitude', 'longitude'}
for photo in soup.find_all('photo'):
print({key:value for key, value in photo.attrs.iteritems() if key in keys})
请注意,你应该在Python-3.X的情况下,使用项目()
。
Note that you should use items()
in case of Python-3.x.
这篇关于美丽的汤解析XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!