使用nokogiri解析谷歌picasa api xml - 命名空间问题? [英] using nokogiri to parse google picasa api xml - namespacing issue?

查看:136
本文介绍了使用nokogiri解析谷歌picasa api xml - 命名空间问题?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图从一些google picasa xml中获取一些数据,并且遇到了一些麻烦。



以下是实际的xml(仅包含一个条目):
http://pastie.org/1736008



基本上,我想收集一些gphoto属性,所以理想情况下我想要做的是:

  doc.xpath('// entry')。map do | entry | 
{:id => entry.children ['gphoto:id'],
:thumb => entry.children ['gphoto:thumbnail'],
:name => entry.children ['gphoto:name'],
:count => entry.children ['gphoto:numphotos']}
end

然而,这并不工作......事实上,当我检查进入的孩子时,我甚至都没有看到任何'gphoto:xxx'atttributes ...所以我很困惑如何找到它们。



谢谢!

解决方案

以下是一些使用nokogiri从您的例如xml。

$ $ p $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $'$' nokogiri'
content = File.read('input.xml')
doc = Nokogiri :: XML(content){| config |
config.options = Nokogiri :: XML :: ParseOptions :: STRICT
}

hashes = doc.xpath('// xmlns:entry')。 |
{
:id => entry.xpath('gphoto:id')。inner_text,
:thumb => entry.parent.xpath('gphoto:thumbnail')。inner_text,
:name => entry.xpath('gphoto:name')。inner_text,
:count =>入门.xpath('gphoto:numphotos')。inner_text
}
结束

放置hashes.inspect

#得到:

#[{:count =>37,:name =>Melody19Months,:thumb =>http://lh3.ggpht.com/_Viv8WkAChHU/AAAAAAAAAAA/AAAAAAAAAAA/pNuu5PgnP1Y/s64 -c / soopingsaw.jpg,:id =>5582695833628950881}]

/ p>


  1. 您的要点中的示例xml需要关闭feed标记。修正了此处

  2. 在xpath表达式要找到条目元素,我们必须使用一个名称空间前缀,所以xmlns:entry不只是entry。后者(在原始代码中使用)将找到元素。它正在寻找空名称空间中的元素,但在您的示例中,它们都继承了feed元素上指定的默认名称空间。 Aaron Patterson写了一篇关于这个问题的介绍(Nokogiri-centric), here ,另有此处

  3. 元素gphoto:缩略图是 feed 元素的子元素, 不是。我为此做了一个小小的调整,保留了原始示例的设计,但是每次提要只搜索一次该元素的值会更好(可能以后填充如果他们真的需要每个人都保留一份副本,那么入门散列)

  4. 配置Nokogiri是非常严格的,但实际上并不是必须的,但在发现问题的早期获得一些帮助是很好的。 li>


I am trying to get some data from some google picasa xml, and am having a bit of trouble..

Here is the actual xml (containing just one entry): http://pastie.org/1736008

Basically, I would like to collect a few of the gphoto attributes, so ideally what I would like to do is:

doc.xpath('//entry').map do |entry|
  {:id => entry.children['gphoto:id'],
   :thumb => entry.children['gphoto:thumbnail'],
   :name => entry.children['gphoto:name'],
   :count => entry.children['gphoto:numphotos']}
end

However, this does not work... In fact, when I examine the children of entry, I do not even see any 'gphoto:xxx' atttributes at all... So I am quite confused as to how to find them.

Thanks!

解决方案

Here's some working code which uses nokogiri to extract the gphoto elements from your example xml.

#!/usr/bin/env ruby
require 'rubygems'
require 'nokogiri'
content = File.read('input.xml')
doc = Nokogiri::XML(content) {|config| 
          config.options = Nokogiri::XML::ParseOptions::STRICT
      }

hashes = doc.xpath('//xmlns:entry').map do |entry|
  {
    :id => entry.xpath('gphoto:id').inner_text,
    :thumb => entry.parent.xpath('gphoto:thumbnail').inner_text,
    :name => entry.xpath('gphoto:name').inner_text,
    :count => entry.xpath('gphoto:numphotos').inner_text
  }
end

puts hashes.inspect

# yields: 
#
# [{:count=>"37", :name=>"Melody19Months", :thumb=>"http://lh3.ggpht.com/_Viv8WkAChHU/AAAAAAAAAAA/AAAAAAAAAAA/pNuu5PgnP1Y/s64-c/soopingsaw.jpg", :id=>"5582695833628950881"}]

Notes:

  1. The sample xml in your gist needed a closing "feed" tag. Fixed here.
  2. In the xpath expression to find the entry elements we must use a namespace prefix, so "xmlns:entry", not just "entry". The latter (used in your original code) will find no elements. It is looking for elements in the null namespace, but in your example, they all inherit the default namespace specified on the feed element. Aaron Patterson wrote a (Nokogiri-centric) introduction to the problem, here, and there's another here.
  3. The element gphoto:thumbnail is a child of the feed element, and not of each entry. I have made a small (hacky) adjustment for that, keeping in the design of your original example, but it would be far better to seek out the value of this element only once per feed (perhaps later populating the entry hashes if they really need to each keep a copy).
  4. Configuring Nokogiri to be strict is not actually necessary, but it's nice to get a little help in spotting problems early.

这篇关于使用nokogiri解析谷歌picasa api xml - 命名空间问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆