用 Ruby 解析 XML [英] Parsing XML with Ruby
问题描述
我对使用 XML 还很陌生,但只是有一个需求落在了我的腿上.我得到了一个通常的(对我来说)XML 格式.标签内有冒号.
I'm way new to working with XML but just had a need dropped in my lap. I have been given an usual (to me) XML format. There are colons within the tags.
<THING1:things type="Container">
<PART1:Id type="Property">1234</PART1:Id>
<PART1:Name type="Property">The Name</PART1:Name>
</THING1:things>
这是一个大文件,它的内容远不止于此,但我希望有人熟悉这种格式.有谁知道处理这种 XML 文档的方法?
It is a large file and there is much more to it than this but I hope this format will be familiar to someone. Does anyone know a way to approach an XML document of this sort?
我宁愿不只是写一种蛮力的方式来解析文本,但我似乎无法在 REXML 或 Hpricot 方面取得任何进展,我怀疑这是由于这些不寻常的标签.
I'd rather not just write a brute-force way of parsing the text but I can't seem to make any headway with REXML or Hpricot and I suspect it is due to these unusual tags.
我的红宝石代码:
require 'hpricot'
xml = File.open( "myfile.xml" )
doc = Hpricot::XML( xml )
(doc/:things).each do |thg|
[ 'Id', 'Name' ].each do |el|
puts "#{el}: #{thg.at(el).innerHTML}"
end
end
...刚刚从:http://railstips.org/blog/archives/2006/12/09/parsing-xml-with-hpricot/
我想我可以从这里找出一些东西,但是这段代码什么也没返回.它不会出错.它只是返回.
And I figured I would be able to figure some stuff out from here but this code returns nothing. It doens't error. It just returns.
推荐答案
正如@pguardiario 提到的,Nokogiri 是事实上的XML 和 HTML 解析库.如果你想在你的例子中打印出 Id
和 Name
值,你可以这样做:
As @pguardiario mentioned, Nokogiri is the de facto XML and HTML parsing library. If you wanted to print out the Id
and Name
values in your example, here is how you would do it:
require 'nokogiri'
xml_str = <<EOF
<THING1:things type="Container">
<PART1:Id type="Property">1234</PART1:Id>
<PART1:Name type="Property">The Name</PART1:Name>
</THING1:things>
EOF
doc = Nokogiri::XML(xml_str)
thing = doc.at_xpath('//things')
puts "ID = " + thing.at_xpath('//Id').content
puts "Name = " + thing.at_xpath('//Name').content
几点说明:
at_xpath
用于匹配一件事.如果你知道你有多个项目,你想使用xpath
代替.- 根据您的文档,命名空间可能会出现问题,因此调用
doc.remove_namespaces!
会有所帮助(请参阅 这个答案进行简短讨论). - 您可以使用
css
方法而不是xpath
如果您对这些方法更满意. - 一定要在
irb
或pry
中玩弄这个来调查方法.
at_xpath
is for matching one thing. If you know you have multiple items, you want to usexpath
instead.- Depending on your document, namespaces can be problematic, so calling
doc.remove_namespaces!
can help (see this answer for a brief discussion). - You can use the
css
methods instead ofxpath
if you're more comfortable with those. - Definitely play around with this in
irb
orpry
to investigate methods.
要处理多个项目,您需要一个根元素,并且您需要删除 xpath
查询中的 //
.
To handle multiple items, you need a root element, and you need to remove the //
in the xpath
query.
require 'nokogiri'
xml_str = <<EOF
<root>
<THING1:things type="Container">
<PART1:Id type="Property">1234</PART1:Id>
<PART1:Name type="Property">The Name1</PART1:Name>
</THING1:things>
<THING2:things type="Container">
<PART2:Id type="Property">2234</PART2:Id>
<PART2:Name type="Property">The Name2</PART2:Name>
</THING2:things>
</root>
EOF
doc = Nokogiri::XML(xml_str)
doc.xpath('//things').each do |thing|
puts "ID = " + thing.at_xpath('Id').content
puts "Name = " + thing.at_xpath('Name').content
end
这会给你:
Id = 1234
Name = The Name1
ID = 2234
Name = The Name2
如果您更熟悉 CSS 选择器,则可以使用这段几乎相同的代码:
If you are more familiar with CSS selectors, you can use this nearly identical bit of code:
doc.css('things').each do |thing|
puts "ID = " + thing.at_css('Id').content
puts "Name = " + thing.at_css('Name').content
end
这篇关于用 Ruby 解析 XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!