将XML转换为Ruby哈希时保留属性 [英] Keeping attributes when converting XML to Ruby hash

查看:154
本文介绍了将XML转换为Ruby哈希时保留属性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个很大的XML文档,我正在分析。在这个文档中,许多标签都有不同的属性。例如:

I have a large XML document I am looking to parse. In this document, many tags have different attributes within them. For example:

<album>
 <song-name type="published">Do Re Mi</song-name>
</album>

目前,我正在使用Rail的哈希分析库,要求'active_support / core_ext / hash'

Currently, I am using Rail's hash-parsing library by requiring 'active_support/core_ext/hash'.

当我将它转换为散列值时,它会删除属性。它返回:

When I convert it to a hash, it drops the attributes. It returns:

{"album"=>{"song-name"=>"Do Re Mi"}}

如何维护这些属性,在这种情况下, type =发布属性?

How do I maintain those attributes, in this case, the type="published" attribute?

这似乎是以前在转换为from_xml哈希时如何使用XML属性? ,但没有确定的答案,但那是从2010年开始的,而且我很好奇自从那时起情况发生了变化。或者,我想知道是否知道解析此XML的另一种方法,以便我仍然可以包含属性信息。

This seems to have been previously been asked in "How can I use XML attributes when converting into a hash with from_xml?", which had no conclusive answer, but that was from 2010, and I'm curious if things have changed since then. Or, I wonder if you know of an alternative way of parsing this XML so that I could still have the attribute information included.

推荐答案

将XML转换为散列并不是一个好的解决方案。您留下的哈希值比原始XML更难解析。另外,如果XML太大,你将留下一个散列,不适合内存,不能被处理,而原始的XML可以使用SAX解析器进行分析。

Converting XML to a hash isn't a good solution. You're left with a hash that is more difficult to parse than the original XML. Plus, if the XML is too big, you'll be left with a hash that won't fit into memory, and can't be processed, whereas the original XML could be parsed using a SAX parser.

假设文件在加载时不会压倒你的内存,我建议使用 Nokogiri

Assuming the file isn't going to overwhelm your memory when loaded, I'd recommend using Nokogiri to parse it, doing something like:

require 'nokogiri'

class Album

  attr_reader :song_name, :song_type
  def initialize(song_name, song_type)
    @song_name = song_name
    @song_type = song_type
  end
end

xml = <<EOT
<xml>
  <album>
   <song-name type="published">Do Re Mi</song-name>
  </album>
  <album>
    <song-name type="unpublished">Blah blah blah</song-name>
  </album>
</xml>
EOT

albums = []
doc = Nokogiri::XML(xml)
doc.search('album').each do |album|
  song_name = album.at('song-name')
  albums << Album.new(
      song_name.text,
      song_name['type']
    )
end

puts albums.first.song_name
puts albums.last.song_type

输出:

Which outputs:

Do Re Mi
unpublished

代码首先定义一个合适的对象来保存你想要的数据。当XML被解析为DOM时,代码将遍历所有< album> 节点,并提取信息,定义该类的一个实例,并将其附加到到专辑数组。

The code starts by defining a suitable object to be used to hold the data you want. When the XML is parsed into a DOM, the code will loop through all the <album> nodes, and extract the information, defining an instance of the class, and appending it to the albums array.

运行后,您将拥有一个数组,将其存储到数据库中,或者根据需要操作它。但是,如果您的目标是将该信息插入到数据库中,那么让DBM读取XML并直接导入它会更聪明。

After running you'd have an array you would walk, and process each item, storing it into a database, or manipulating it however you want. Though, if your goal is to insert that information into a database, you'd be smarter to let the DBM read the XML and import it directly.

这篇关于将XML转换为Ruby哈希时保留属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆