如何使用Ruby和Nokogiri将XML节点解析为CSV [英] How to parse XML nodes to CSV with Ruby and Nokogiri
问题描述
我有一个XML文件:
?xml version="1.0" encoding="iso-8859-1"?>
<Offers xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://ssc.channeladvisor.com/files/cageneric.xsd">
<Offer>
<Model><![CDATA[11016001]]></Model>
<Manufacturer><![CDATA[Crocs, Inc.]]></Manufacturer>
<ManufacturerModel><![CDATA[11016-001]]></ManufacturerModel>
...lots more nodes
<Custom6><![CDATA[<li>Bold midsole stripe for a sporty look.</li>
<li>Odor-resistant, easy to clean, and quick to dry.</li>
<li>Ventilation ports for enhanced breathability.</li>
<li>Lightweight, non-marking soles.</li>
<li>Water-friendly and buoyant; weighs only ounces.</li>
<li>Fully molded Croslite™ material for lightweight cushioning and comfort.</li>
<li>Heel strap swings back for snug fit, forward for wear as a clog.</li>]]></Custom6>
</Offer>
....lots lots more <Offer> entries
</Offers>
我想将要约"的每个实例解析为CSV文件中自己的行:
I want to parse each instance of 'Offer' into its own row in a CSV file:
require 'csv'
require 'nokogiri'
file = File.read('input.xml')
doc = Nokogiri::XML(file)
a = []
csv = CSV.open('output.csv', 'wb')
doc.css('Offer').each do |node|
a.push << node.content.split
end
a.each { |a| csv << a }
这很好运行,除了我在空格上而不是在Offer节点的每个元素上进行拆分,因此每个单词都进入了CSV文件中自己的列.
This runs nicely except I'm splitting on whitespace rather than each element of the Offer node so every word is going into its own column in the CSV file.
是否可以提取每个节点的内容?如何将节点名称用作CSV文件中的标头?
Is there a way to pick up the content of each node and how do I use the node names as headers in the CSV file?
推荐答案
这假定每个Offer
元素始终具有相同的子节点(尽管它们可以为空):
This assumes that each Offer
element always has the same child nodes (though they can be empty):
CSV.open('output.csv', 'wb') do |csv|
doc.search('Offer').each do |x|
csv << x.search('*').map(&:text)
end
end
并获取标头(从第一个Offer
元素开始):
And to get headers (from the first Offer
element):
CSV.open('output.csv', 'wb') do |csv|
csv << doc.at('Offer').search('*').map(&:name)
doc.search('Offer').each do |x|
csv << x.search('*').map(&:text)
end
end
search
和at
是Nokogiri函数,可以采用XPath或CSS选择器字符串. at
将返回元素的第一次出现; search
将提供匹配元素的数组(如果找不到匹配项,则为空数组).在这种情况下,*
将选择作为当前节点的直接子代的所有节点.
search
and at
are Nokogiri functions that can take either XPath or CSS selector strings. at
will return the first occurrence of an element; search
will provide an array of matching elements (or an empty array if no matches are found). The *
in this case will select all nodes that are direct children of the current node.
name
和text
也是Nokogiri函数(用于元素). name
提供元素的名称; text
提供节点的文本或CDATA内容.
Both name
and text
are also Nokogiri functions (for an element). name
provides the element's name; text
provides the text or CDATA content of a node.
这篇关于如何使用Ruby和Nokogiri将XML节点解析为CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!