用 Ruby 解析 XML [英] Parsing XML with Ruby

查看:21
本文介绍了用 Ruby 解析 XML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对使用 XML 还很陌生,但只是有一个需求落在了我的腿上.我得到了一个通常的(对我来说)XML 格式.标签内有冒号.

I'm way new to working with XML but just had a need dropped in my lap. I have been given an usual (to me) XML format. There are colons within the tags.

<THING1:things type="Container">
  <PART1:Id type="Property">1234</PART1:Id>
  <PART1:Name type="Property">The Name</PART1:Name>
</THING1:things>

这是一个大文件,它的内容远不止于此,但我希望有人熟悉这种格式.有谁知道处理这种 XML 文档的方法?

It is a large file and there is much more to it than this but I hope this format will be familiar to someone. Does anyone know a way to approach an XML document of this sort?

我宁愿不只是写一种蛮力的方式来解析文本,但我似乎无法在 REXML 或 Hpricot 方面取得任何进展,我怀疑这是由于这些不寻常的标签.

I'd rather not just write a brute-force way of parsing the text but I can't seem to make any headway with REXML or Hpricot and I suspect it is due to these unusual tags.

我的红宝石代码:

    require 'hpricot'
    xml = File.open( "myfile.xml" )

    doc = Hpricot::XML( xml )

   (doc/:things).each do |thg|
     [ 'Id', 'Name' ].each do |el|
       puts "#{el}: #{thg.at(el).innerHTML}"
     end
   end

...刚刚从:http://railstips.org/blog/archives/2006/12/09/parsing-xml-with-hpricot/

我想我可以从这里找出一些东西,但是这段代码什么也没返回.它不会出错.它只是返回.

And I figured I would be able to figure some stuff out from here but this code returns nothing. It doens't error. It just returns.

推荐答案

正如@pguardiario 提到的,Nokogiri 是事实上的XML 和 HTML 解析库.如果你想在你的例子中打印出 IdName 值,你可以这样做:

As @pguardiario mentioned, Nokogiri is the de facto XML and HTML parsing library. If you wanted to print out the Id and Name values in your example, here is how you would do it:

require 'nokogiri'

xml_str = <<EOF
<THING1:things type="Container">
  <PART1:Id type="Property">1234</PART1:Id>
  <PART1:Name type="Property">The Name</PART1:Name>
</THING1:things>
EOF

doc = Nokogiri::XML(xml_str)

thing = doc.at_xpath('//things')
puts "ID   = " + thing.at_xpath('//Id').content
puts "Name = " + thing.at_xpath('//Name').content

几点说明:

  • at_xpath 用于匹配一件事.如果你知道你有多个项目,你想使用 xpath 代替.
  • 根据您的文档,命名空间可能会出现问题,因此调用 doc.remove_namespaces! 会有所帮助(请参阅 这个答案进行简短讨论).
  • 您可以使用 css 方法而不是 xpath 如果您对这些方法更满意.
  • 一定要在 irbpry 中玩弄这个来调查方法.
  • at_xpath is for matching one thing. If you know you have multiple items, you want to use xpath instead.
  • Depending on your document, namespaces can be problematic, so calling doc.remove_namespaces! can help (see this answer for a brief discussion).
  • You can use the css methods instead of xpath if you're more comfortable with those.
  • Definitely play around with this in irb or pry to investigate methods.

要处理多个项目,您需要一个根元素,并且您需要删除 xpath 查询中的 //.

To handle multiple items, you need a root element, and you need to remove the // in the xpath query.

require 'nokogiri'

xml_str = <<EOF
<root>
  <THING1:things type="Container">
    <PART1:Id type="Property">1234</PART1:Id>
    <PART1:Name type="Property">The Name1</PART1:Name>
  </THING1:things>
  <THING2:things type="Container">
    <PART2:Id type="Property">2234</PART2:Id>
    <PART2:Name type="Property">The Name2</PART2:Name>
  </THING2:things>
</root>
EOF

doc = Nokogiri::XML(xml_str)
doc.xpath('//things').each do |thing|
  puts "ID   = " + thing.at_xpath('Id').content
  puts "Name = " + thing.at_xpath('Name').content
end

这会给你:

Id   = 1234
Name = The Name1

ID   = 2234
Name = The Name2

如果您更熟悉 CSS 选择器,则可以使用这段几乎相同的代码:

If you are more familiar with CSS selectors, you can use this nearly identical bit of code:

doc.css('things').each do |thing|
  puts "ID   = " + thing.at_css('Id').content
  puts "Name = " + thing.at_css('Name').content
end

这篇关于用 Ruby 解析 XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆