如何使用Ruby和Nokogiri将XML节点解析为CSV [英] How to parse XML nodes to CSV with Ruby and Nokogiri

查看:78
本文介绍了如何使用Ruby和Nokogiri将XML节点解析为CSV的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个XML文件:

?xml version="1.0" encoding="iso-8859-1"?>
<Offers xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://ssc.channeladvisor.com/files/cageneric.xsd">
  <Offer>
   <Model><![CDATA[11016001]]></Model>
   <Manufacturer><![CDATA[Crocs, Inc.]]></Manufacturer>
   <ManufacturerModel><![CDATA[11016-001]]></ManufacturerModel>
   ...lots more nodes
   <Custom6><![CDATA[<li>Bold midsole stripe for a sporty look.</li>
    <li>Odor-resistant, easy to clean, and quick to dry.</li>
    <li>Ventilation ports for enhanced breathability.</li>
    <li>Lightweight, non-marking soles.</li>
    <li>Water-friendly and buoyant; weighs only ounces.</li>
    <li>Fully molded Croslite&trade; material for lightweight cushioning and comfort.</li>
    <li>Heel strap swings back for snug fit, forward for wear as a clog.</li>]]></Custom6>
  </Offer>
....lots lots more <Offer> entries
</Offers>

我想将要约"的每个实例解析为CSV文件中自己的行:

I want to parse each instance of 'Offer' into its own row in a CSV file:

require 'csv'
require 'nokogiri'

file = File.read('input.xml')
doc = Nokogiri::XML(file)
a = []
csv = CSV.open('output.csv', 'wb') 

doc.css('Offer').each do |node|
    a.push << node.content.split
end

a.each { |a| csv << a } 

这很好运行,除了我在空格上而不是在Offer节点的每个元素上进行拆分,因此每个单词都进入了CSV文件中自己的列.

This runs nicely except I'm splitting on whitespace rather than each element of the Offer node so every word is going into its own column in the CSV file.

是否可以提取每个节点的内容?如何将节点名称用作CSV文件中的标头?

Is there a way to pick up the content of each node and how do I use the node names as headers in the CSV file?

推荐答案

这假定每个Offer元素始终具有相同的子节点(尽管它们可以为空):

This assumes that each Offer element always has the same child nodes (though they can be empty):

CSV.open('output.csv', 'wb') do |csv|
  doc.search('Offer').each do |x|
    csv << x.search('*').map(&:text)
  end
end

并获取标头(从第一个Offer元素开始):

And to get headers (from the first Offer element):

CSV.open('output.csv', 'wb') do |csv|
  csv << doc.at('Offer').search('*').map(&:name)
  doc.search('Offer').each do |x|
    csv << x.search('*').map(&:text)
  end
end

searchat是Nokogiri函数,可以采用XPath或CSS选择器字符串. at将返回元素的第一次出现; search将提供匹配元素的数组(如果找不到匹配项,则为空数组).在这种情况下,*将选择作为当前节点的直接子代的所有节点.

search and at are Nokogiri functions that can take either XPath or CSS selector strings. at will return the first occurrence of an element; search will provide an array of matching elements (or an empty array if no matches are found). The * in this case will select all nodes that are direct children of the current node.

nametext也是Nokogiri函数(用于元素). name提供元素的名称; text提供节点的文本或CDATA内容.

Both name and text are also Nokogiri functions (for an element). name provides the element's name; text provides the text or CDATA content of a node.

这篇关于如何使用Ruby和Nokogiri将XML节点解析为CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆