使用 Ruby 解析 XML [英] Parsing XML with Ruby
问题描述
我对使用 XML 还很陌生,但刚好有一个需要放在我的腿上.我得到了一个通常的(对我来说)XML 格式.标签中有冒号.
I'm way new to working with XML but just had a need dropped in my lap. I have been given an usual (to me) XML format. There are colons within the tags.
<THING1:things type="Container">
<PART1:Id type="Property">1234</PART1:Id>
<PART1:Name type="Property">The Name</PART1:Name>
</THING1:things>
这是一个大文件,还有很多其他内容,但我希望有人熟悉这种格式.有人知道处理这种 XML 文档的方法吗?
It is a large file and there is much more to it than this but I hope this format will be familiar to someone. Does anyone know a way to approach an XML document of this sort?
我宁愿不只是编写一种解析文本的蛮力方式,但我似乎无法在 REXML 或 Hpricot 上取得任何进展,我怀疑这是由于这些不寻常的标签造成的.
I'd rather not just write a brute-force way of parsing the text but I can't seem to make any headway with REXML or Hpricot and I suspect it is due to these unusual tags.
我的红宝石代码:
require 'hpricot'
xml = File.open( "myfile.xml" )
doc = Hpricot::XML( xml )
(doc/:things).each do |thg|
[ 'Id', 'Name' ].each do |el|
puts "#{el}: #{thg.at(el).innerHTML}"
end
end
...这只是从:http://railstips.org/blog/archives/2006/12/09/parsing-xml-with-hpricot/
我想我可以从这里找出一些东西,但这段代码什么都不返回.它不会出错.它只是返回.
And I figured I would be able to figure some stuff out from here but this code returns nothing. It doens't error. It just returns.
推荐答案
正如@pguardiario 提到的,Nokogiri 是事实上的XML 和 HTML 解析库.如果您想打印示例中的 Id
和 Name
值,您可以这样做:
As @pguardiario mentioned, Nokogiri is the de facto XML and HTML parsing library. If you wanted to print out the Id
and Name
values in your example, here is how you would do it:
require 'nokogiri'
xml_str = <<EOF
<THING1:things type="Container">
<PART1:Id type="Property">1234</PART1:Id>
<PART1:Name type="Property">The Name</PART1:Name>
</THING1:things>
EOF
doc = Nokogiri::XML(xml_str)
thing = doc.at_xpath('//things')
puts "ID = " + thing.at_xpath('//Id').content
puts "Name = " + thing.at_xpath('//Name').content
一些注意事项:
at_xpath
用于匹配一件事.如果您知道自己有多个项目,则希望改用xpath
.- 根据您的文档,命名空间可能有问题,因此调用
doc.remove_namespaces!
会有所帮助(请参阅 这个答案进行简要讨论). - 如果您更喜欢使用
css
方法而不是xpath
. - 一定要在
irb
或pry
中使用它来研究方法.
at_xpath
is for matching one thing. If you know you have multiple items, you want to usexpath
instead.- Depending on your document, namespaces can be problematic, so calling
doc.remove_namespaces!
can help (see this answer for a brief discussion). - You can use the
css
methods instead ofxpath
if you're more comfortable with those. - Definitely play around with this in
irb
orpry
to investigate methods.
要处理多个项目,需要一个根元素,并且需要删除xpath
查询中的//
.
To handle multiple items, you need a root element, and you need to remove the //
in the xpath
query.
require 'nokogiri'
xml_str = <<EOF
<root>
<THING1:things type="Container">
<PART1:Id type="Property">1234</PART1:Id>
<PART1:Name type="Property">The Name1</PART1:Name>
</THING1:things>
<THING2:things type="Container">
<PART2:Id type="Property">2234</PART2:Id>
<PART2:Name type="Property">The Name2</PART2:Name>
</THING2:things>
</root>
EOF
doc = Nokogiri::XML(xml_str)
doc.xpath('//things').each do |thing|
puts "ID = " + thing.at_xpath('Id').content
puts "Name = " + thing.at_xpath('Name').content
end
这会给你:
Id = 1234
Name = The Name1
ID = 2234
Name = The Name2
如果您更熟悉 CSS 选择器,可以使用这段几乎相同的代码:
If you are more familiar with CSS selectors, you can use this nearly identical bit of code:
doc.css('things').each do |thing|
puts "ID = " + thing.at_css('Id').content
puts "Name = " + thing.at_css('Name').content
end
这篇关于使用 Ruby 解析 XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!