如何使用Nokogiri解析XML文件? [英] How do I use Nokogiri to parse an XML file?
问题描述
我的Nokogiri遇到了一些问题.
I'm having some issues with Nokogiri.
我正在尝试解析此XML文件:
I am trying to parse this XML file:
<Collection version="2.0" id="74j5hc4je3b9">
<Name>A Funfair in Bangkok</Name>
<PermaLink>Funfair in Bangkok</PermaLink>
<PermaLinkIsName>True</PermaLinkIsName>
<Description>A small funfair near On Nut in Bangkok.</Description>
<Date>2009-08-03T00:00:00</Date>
<IsHidden>False</IsHidden>
<Items>
<Item filename="AGC_1998.jpg">
<Title>Funfair in Bangkok</Title>
<Caption>A small funfair near On Nut in Bangkok.</Caption>
<Authors>Anthony Bouch</Authors>
<Copyright>Copyright © Anthony Bouch</Copyright>
<CreatedDate>2009-08-07T19:22:08</CreatedDate>
<Keywords>
<Keyword>Funfair</Keyword>
<Keyword>Bangkok</Keyword>
<Keyword>Thailand</Keyword>
</Keywords>
<ThumbnailSize width="133" height="200" />
<PreviewSize width="532" height="800" />
<OriginalSize width="2279" height="3425" />
</Item>
<Item filename="AGC_1164.jpg" iscover="True">
<Title>Bumper Cars at a Funfair in Bangkok</Title>
<Caption>Bumper cars at a small funfair near On Nut in Bangkok.</Caption>
<Authors>Anthony Bouch</Authors>
<Copyright>Copyright © Anthony Bouch</Copyright>
<CreatedDate>2009-08-03T22:08:24</CreatedDate>
<Keywords>
<Keyword>Bumper Cars</Keyword>
<Keyword>Funfair</Keyword>
<Keyword>Bangkok</Keyword>
<Keyword>Thailand</Keyword>
</Keywords>
<ThumbnailSize width="200" height="133" />
<PreviewSize width="800" height="532" />
<OriginalSize width="3725" height="2479" />
</Item>
</Items>
</Collection>
我希望所有这些信息都显示在屏幕上,仅此而已. 应该简单吧? 我正在这样做:
I want all of that information displayed to the screen, that's it. Should be simple right? I am doing this:
require 'nokogiri'
doc = Nokogiri::XML(File.open("sample.xml"))
@block = doc.css("items item").map {|node| node.children.text}
puts @block
每个Items
是一个节点,并且在其下有Item
的子节点?
Each Items
is a node, and under that there are children nodes of Item
?
我为此创建了一个映射,该映射返回一个哈希,并且{}
中的代码遍历每个节点,并将子代文本放入@block
中.
然后,我可以在屏幕上显示所有子节点的文本.
I create a map of this, which returns a hash, and the code in {}
goes through each node and places the children text into @block
.
Then I can display all of the child node's text to the screen.
我不知道我有多远或多远,因为我已经阅读了很多文章,并且对基础知识仍然有些困惑,特别是因为通常使用一种新语言,我会从文件中读取并输出到屏幕上一个基本程序.
I have no idea how far or close I am, because I've read so many articles, and am still a little confused on the basics especially since usually with a new language, I read from a file and output to the screen for a basic program.
推荐答案
在这里,我将尝试向您解释您遇到的所有问题/困惑:
Here I will try to explain you all the questions/confusions you are having:
require 'nokogiri'
doc = Nokogiri::XML.parse <<-XML
<Collection version="2.0" id="74j5hc4je3b9">
<Name>A Funfair in Bangkok</Name>
<PermaLink>Funfair in Bangkok</PermaLink>
<PermaLinkIsName>True</PermaLinkIsName>
<Description>A small funfair near On Nut in Bangkok.</Description>
<Date>2009-08-03T00:00:00</Date>
<IsHidden>False</IsHidden>
<Items>
<Item filename="AGC_1998.jpg">
<Title>Funfair in Bangkok</Title>
<Caption>A small funfair near On Nut in Bangkok.</Caption>
<Authors>Anthony Bouch</Authors>
<Copyright>Copyright © Anthony Bouch</Copyright>
<CreatedDate>2009-08-07T19:22:08</CreatedDate>
<Keywords>
<Keyword>Funfair</Keyword>
<Keyword>Bangkok</Keyword>
<Keyword>Thailand</Keyword>
</Keywords>
<ThumbnailSize width="133" height="200" />
<PreviewSize width="532" height="800" />
<OriginalSize width="2279" height="3425" />
</Item>
<Item filename="AGC_1164.jpg" iscover="True">
<Title>Bumper Cars at a Funfair in Bangkok</Title>
<Caption>Bumper cars at a small funfair near On Nut in Bangkok.</Caption>
<Authors>Anthony Bouch</Authors>
<Copyright>Copyright © Anthony Bouch</Copyright>
<CreatedDate>2009-08-03T22:08:24</CreatedDate>
<Keywords>
<Keyword>Bumper Cars</Keyword>
<Keyword>Funfair</Keyword>
<Keyword>Bangkok</Keyword>
<Keyword>Thailand</Keyword>
</Keywords>
<ThumbnailSize width="200" height="133" />
<PreviewSize width="800" height="532" />
<OriginalSize width="3725" height="2479" />
</Item>
</Items>
</Collection>
XML
因此,根据我对Nokogiri的理解,每个项目"都是一个节点,并且在其下有项目"的子节点?
So from my understanding of Nokogiri, each 'Items' is a node, and under that there are children nodes of 'Item'?
否,每个 Items 是Nokogiri::XML::NodeSet
.在此之下,有两个 Items 的子节点,它们是Nokogiri::XML::Element
类对象.您也可以说Nokogiri::XML::Node
No, each Items are Nokogiri::XML::NodeSet
. And under that there are 2 children nodes of Items,which are of Nokogiri::XML::Element
class object. You can say them also Nokogiri::XML::Node
doc.class # => Nokogiri::XML::Document
@block = doc.xpath("//Items/Item")
@block.class # => Nokogiri::XML::NodeSet
@block.count # => 2
@block.map { |node| node.name }
# => ["Item", "Item"]
@block.map { |node| node.class }
# => [Nokogiri::XML::Element, Nokogiri::XML::Element]
@block.map { |node| node.children.count }
# => [19, 19]
@block.map { |node| node.class.superclass }
# => [Nokogiri::XML::Node, Nokogiri::XML::Node]
我们为此创建了一个映射,该映射将返回一个哈希值,而{}中的代码将遍历每个节点并将子级文本放入@block.然后,我可以在屏幕上显示此子节点的所有文本.
We create a map of this, which returns a hash I believe, and the code in {} goes through each node and places the children text into @block. Then I can display all of this child node's text to the screen.
我不明白这一点.尽管我试图在下面解释以显示什么是 Node ,以及什么是 Nokogiri 中的 Nodeset .记住Nodeset
是 Nodes 的集合.
I don't understand this. Although I tried to explain below to show what is Node,and what is Nodeset in Nokogiri. Remember Nodeset
is a collection of Nodes.
@chld_class = @block.map do |node|
node.children.class
end
@chld_class
# => [Nokogiri::XML::NodeSet, Nokogiri::XML::NodeSet]
@chld_name = @block.map do |node|
node.children.map { |n| [n.name,n.class] }
end
@chld_name
# => [[["text", Nokogiri::XML::Text],
# ["Title", Nokogiri::XML::Element],
# ["text", Nokogiri::XML::Text],
# ["Caption", Nokogiri::XML::Element],
# ["text", Nokogiri::XML::Text],
# ["Authors", Nokogiri::XML::Element],
# ["text", Nokogiri::XML::Text],
# ["Copyright", Nokogiri::XML::Element],
# ["text", Nokogiri::XML::Text],
# ["CreatedDate", Nokogiri::XML::Element],
# ["text", Nokogiri::XML::Text],
# ["Keywords", Nokogiri::XML::Element],
# ["text", Nokogiri::XML::Text],
# ["ThumbnailSize", Nokogiri::XML::Element],
# ["text", Nokogiri::XML::Text],
# ["PreviewSize", Nokogiri::XML::Element],
# ["text", Nokogiri::XML::Text],
# ["OriginalSize", Nokogiri::XML::Element],
# ["text", Nokogiri::XML::Text]],
# [["text", Nokogiri::XML::Text],
# ["Title", Nokogiri::XML::Element],
# ["text", Nokogiri::XML::Text],
# ["Caption", Nokogiri::XML::Element],
# ["text", Nokogiri::XML::Text],
# ["Authors", Nokogiri::XML::Element],
# ["text", Nokogiri::XML::Text],
# ["Copyright", Nokogiri::XML::Element],
# ["text", Nokogiri::XML::Text],
# ["CreatedDate", Nokogiri::XML::Element],
# ["text", Nokogiri::XML::Text],
# ["Keywords", Nokogiri::XML::Element],
# ["text", Nokogiri::XML::Text],
# ["ThumbnailSize", Nokogiri::XML::Element],
# ["text", Nokogiri::XML::Text],
# ["PreviewSize", Nokogiri::XML::Element],
# ["text", Nokogiri::XML::Text],
# ["OriginalSize", Nokogiri::XML::Element],
# ["text", Nokogiri::XML::Text]]]
@chld_name = @block.map do |node|
node.children.map{|n| [n.name,n.text.strip] if n.elem? }.compact
end.compact
@chld_name
# => [[["Title", "Funfair in Bangkok"],
# ["Caption", "A small funfair near On Nut in Bangkok."],
# ["Authors", "Anthony Bouch"],
# ["Copyright", "Copyright © Anthony Bouch"],
# ["CreatedDate", "2009-08-07T19:22:08"],
# ["Keywords", "Funfair\n Bangkok\n Thailand"],
# ["ThumbnailSize", ""],
# ["PreviewSize", ""],
# ["OriginalSize", ""]],
# [["Title", "Bumper Cars at a Funfair in Bangkok"],
# ["Caption", "Bumper cars at a small funfair near On Nut in Bangkok."],
# ["Authors", "Anthony Bouch"],
# ["Copyright", "Copyright © Anthony Bouch"],
# ["CreatedDate", "2009-08-03T22:08:24"],
# ["Keywords",
# "Bumper Cars\n Funfair\n Bangkok\n Thailand"],
# ["ThumbnailSize", ""],
# ["PreviewSize", ""],
# ["OriginalSize", ""]]]
这篇关于如何使用Nokogiri解析XML文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!