美丽的汤解析XML [英] Beautiful soup parsing XML

查看:236
本文介绍了美丽的汤解析XML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这样的数据结构。

<photo id="123" owner="12345" secret="xx" server="12" farm="4" title="109L_0195" 
ispublic="1" isfriend="0" isfamily="0" views="0" tags="military czechrepublic kmk koně 
humpolec všestrannost humpoec vysocinaregion" latitude="49.550933" longitude="15.36652" 
accuracy="16" context="0" place_id="tg5cqdpWW7q18rE" woeid="790349" geo_is_family="0" 
geo_is_friend="0" geo_is_contact="0" geo_is_public="1">
 <description>
Kvalifikační kolo KMK - všestrannost 18.7.2014 - Humpolec
</description>
</photo>


<photo id="123" owner="06" secret="xx" server="12" farm="4"   
title="Ytterligare en bild ifrån inspelningen av Johan Stjerquist's video: Nudist 
Javisst." ispublic="1" isfriend="0" isfamily="0" views="0" tags="square squareformat 
iphoneography instagramapp uploaded:by=instagram" latitude="56.171184" 
longitude="14.741144" accuracy="16" context="0" place_id="u4MzsN9ZW7KnPWo" 
woeid="898740" geo_is_family="0" geo_is_friend="0" geo_is_contact="0" geo_is_public="1">
<description/>
</photo>

其有关通过Flickr的API访问的照片信息的和平。
我想提取以下信息:
    ID
    标题
    标签
    经度
    纬度

Its a peace of information about a photo accessed through the Flickr API. I want to extract the following information: id title tags longitude latitude

我试图通过这个来完成。

which I tried to accomplish through this.

url = "https://api.flickr.com/services/rest/?method=flickr.photos.search&api_key=5....b&per_page=250&accuracy=1&has_geo=1&extras=geo,tags,views,description"
soup = BeautifulSoup(urlopen(url))

for data in soup.find_all('photo'):
    print (data.attrs['id' , 'title' , 'tags' , 'latitude' , 'longitude' , 'accuracy'])

这没有奏效。在 ATTRS 只接受一个参数。综观 BeautifulSoup 的文档,它看起来像没有其他工具,它可以帮助我得到所有的信息,或者是我错了(的 http://www.crummy.com/software/BeautifulSoup/bs4/doc/ )?我试图替换 ATTRS P ,但没有工作也没有。

That did not work. The attrs accepts only one argument. Looking at the documentation of BeautifulSoup it looks like there is no other tool which could help me getting all the information or am I mistaken (http://www.crummy.com/software/BeautifulSoup/bs4/doc/)? I tried to substitute attrsthrough p but that did not work neither.

该命令,我可以使用任何想法?

Any ideas which command I could use?

推荐答案

ATTRS 是一个字典,你可以使用字典COM prehension只得到特定按键:

Since attrs is a dictionary, you can get only specific keys using dictionary comprehension:

keys = {'id', 'title', 'tags', 'latitude', 'longitude'}
for photo in soup.find_all('photo'):
    print({key:value for key, value in photo.attrs.iteritems() if key in keys})

请注意,你应该在Python-3.X的情况下,使用项目()

Note that you should use items() in case of Python-3.x.

这篇关于美丽的汤解析XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆