使用 Ruby 解析 XML [英] Parsing XML with Ruby

查看:43
本文介绍了使用 Ruby 解析 XML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对使用 XML 还很陌生,但刚好有一个需要放在我的腿上.我得到了一个通常的(对我来说)XML 格式.标签中有冒号.

I'm way new to working with XML but just had a need dropped in my lap. I have been given an usual (to me) XML format. There are colons within the tags.

<THING1:things type="Container">
  <PART1:Id type="Property">1234</PART1:Id>
  <PART1:Name type="Property">The Name</PART1:Name>
</THING1:things>

这是一个大文件,还有很多其他内容,但我希望有人熟悉这种格式.有人知道处理这种 XML 文档的方法吗?

It is a large file and there is much more to it than this but I hope this format will be familiar to someone. Does anyone know a way to approach an XML document of this sort?

我宁愿不只是编写一种解析文本的蛮力方式,但我似乎无法在 REXML 或 Hpricot 上取得任何进展,我怀疑这是由于这些不寻常的标签造成的.

I'd rather not just write a brute-force way of parsing the text but I can't seem to make any headway with REXML or Hpricot and I suspect it is due to these unusual tags.

我的红宝石代码:

    require 'hpricot'
    xml = File.open( "myfile.xml" )

    doc = Hpricot::XML( xml )

   (doc/:things).each do |thg|
     [ 'Id', 'Name' ].each do |el|
       puts "#{el}: #{thg.at(el).innerHTML}"
     end
   end

...这只是从:http://railstips.org/blog/archives/2006/12/09/parsing-xml-with-hpricot/

我想我可以从这里找出一些东西,但这段代码什么都不返回.它不会出错.它只是返回.

And I figured I would be able to figure some stuff out from here but this code returns nothing. It doens't error. It just returns.

推荐答案

正如@pguardiario 提到的,Nokogiri 是事实上的XML 和 HTML 解析库.如果您想打印示例中的 IdName 值,您可以这样做:

As @pguardiario mentioned, Nokogiri is the de facto XML and HTML parsing library. If you wanted to print out the Id and Name values in your example, here is how you would do it:

require 'nokogiri'

xml_str = <<EOF
<THING1:things type="Container">
  <PART1:Id type="Property">1234</PART1:Id>
  <PART1:Name type="Property">The Name</PART1:Name>
</THING1:things>
EOF

doc = Nokogiri::XML(xml_str)

thing = doc.at_xpath('//things')
puts "ID   = " + thing.at_xpath('//Id').content
puts "Name = " + thing.at_xpath('//Name').content

一些注意事项:

  • at_xpath 用于匹配一件事.如果您知道自己有多个项目,则希望改用 xpath.
  • 根据您的文档,命名空间可能有问题,因此调用 doc.remove_namespaces! 会有所帮助(请参阅 这个答案进行简要讨论).
  • 如果您更喜欢使用 css 方法而不是 xpath.
  • 一定要在 irbpry 中使用它来研究方法.
  • at_xpath is for matching one thing. If you know you have multiple items, you want to use xpath instead.
  • Depending on your document, namespaces can be problematic, so calling doc.remove_namespaces! can help (see this answer for a brief discussion).
  • You can use the css methods instead of xpath if you're more comfortable with those.
  • Definitely play around with this in irb or pry to investigate methods.

要处理多个项目,需要一个根元素,并且需要删除xpath查询中的//.

To handle multiple items, you need a root element, and you need to remove the // in the xpath query.

require 'nokogiri'

xml_str = <<EOF
<root>
  <THING1:things type="Container">
    <PART1:Id type="Property">1234</PART1:Id>
    <PART1:Name type="Property">The Name1</PART1:Name>
  </THING1:things>
  <THING2:things type="Container">
    <PART2:Id type="Property">2234</PART2:Id>
    <PART2:Name type="Property">The Name2</PART2:Name>
  </THING2:things>
</root>
EOF

doc = Nokogiri::XML(xml_str)
doc.xpath('//things').each do |thing|
  puts "ID   = " + thing.at_xpath('Id').content
  puts "Name = " + thing.at_xpath('Name').content
end

这会给你:

Id   = 1234
Name = The Name1

ID   = 2234
Name = The Name2

如果您更熟悉 CSS 选择器,可以使用这段几乎相同的代码:

If you are more familiar with CSS selectors, you can use this nearly identical bit of code:

doc.css('things').each do |thing|
  puts "ID   = " + thing.at_css('Id').content
  puts "Name = " + thing.at_css('Name').content
end

这篇关于使用 Ruby 解析 XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆