在 clojure 中解析 XML [英] parsing XML in clojure
问题描述
我是 clojure 的新手,所以请耐心等待.我有一个像这样的 XML
<Row Id="0"><Col Id="0" Type="Num" Value="0"/></行></XVar><XVar Id="TrancheAnalysis.IndexDuration" Type="Multi" Value="" Rows="1" Columns="1"><Row Id="0"><Col Id="0" Type="Num" Value="3.4380728252313069"/></行></XVar><XVar Id="TrancheAnalysis.IndexLevel01" Type="Multi" Value="" Rows="1" Columns="1"><Row Id="0"><Col Id="0" Type="Num" Value="30693.926279941188"/></行></XVar><XVar Id="TrancheAnalysis.TrancheDelta" Type="Multi" Value="" Rows="1" Columns="1"><Row Id="0"><Col Id="0" Type="Num" Value="8.9304387917502073"/></行></XVar><XVar Id="TrancheAnalysis.TrancheDuration" Type="Multi" Value="" Rows="1" Columns="1"><Row Id="0"><Col Id="0" Type="Num" Value="3.0775955481964035"/></行></XVar></XVar>
它会重复.由此,我希望能够生成包含这些列的 CSV 文件
IndexName,TrancheAnalysis.IndexDuration,TrancheAnalysis.TrancheDurationcdx9,3.4380728252313069,3.0775955481964035………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………
我能够解析像
这样的简单 XML 文件
使用此代码
(ns 动态规划(:require [clojure.xml :as xml]));获取输入文件(定义校准文件C:/ashwani/Eclipse/HistoricalTrancheAnalysis/src/CalibrationQuotes.xml")(def mktdataFile "C:/ashwani/Eclipse/HistoricalTrancheAnalysis/src/MarketData.xml")(定义示例C:/ashwani/Eclipse/HistoricalTrancheAnalysis/src/Sample.xml");解析校准输入文件(def CalibOp (对于 [x(xml-seq(xml/parse (java.io.File.calibrationFile))):当(或(= :IndexName (:tag x))(= : Tenor (:tag x))(= :UpfrontFee (:tag x))(= :RunningFee (:tag x))(= :DeltaFee (:tag x))(= :IndexLevels (:tag x))(= :TrancheStart (:tag x))(= :TrancheEnd (:tag x)))](first(:content x))))(打印 CalibOp)
但是第二个 XML 很简单;另一方面,我不知道如何遍历第一个 XML 示例的嵌套结构并提取我想要的信息.
任何帮助都会很棒.
我会使用 data.zip(以前是 clojure.contrib.zip-filter).它提供了很多 xml 解析能力,并且可以轻松地执行类似 xpath 的表达式.自述文件将其描述为过滤树的系统,特别是 XML 树.
下面我有一些示例代码,用于为 CSV 文件创建行".行是列名到属性值的映射.
(ns 工作(:require [clojure.xml :as xml][clojure.zip :as zip][clojure.contrib.zip-filter.xml :as zf]));从 xml 文件创建一个 zip(def zip (zip/xml-zip (xml/parse "data.xml")));拉出所有根Id"属性值的列表(zf/xml-> zip (zf/attr:Id))(定义值 [xvar-zip]查找特定元素的 id 和值"(let [id (-> xvar-zip zip/node :attrs :Id) ; 手动访问值 (zf/xml1-> xvar-zip ; 使用类似 xpath 的表达式来提取值:排 ;需要行元素:Col;然后是列元素(zf/attr :Value))] ;最后拉出价值{id 值}));获取单列的列值"对(zf/xml1-> zip(zf/attr= :Id "cdx9") ;过滤 IDcdx9":XVar ;过滤它下的 XVars(zf/attr= :Id "TrancheAnalysis.IndexDuration") ;过滤 id价值) ;对上面的结果应用价值函数;创建每个列键到其对应值的映射(应用合并(zf/xml-> zip (zf/attr= :Id "cdx9") :XVar 值))
我不确定 xml 将如何与多个 Dictionary XVars 一起使用,因为它是一个根元素.如果需要,对此类工作有用的其他函数之一是 mapcat
,它 cat
是从映射函数返回的所有值.>
测试源以及.
我的另一个重要建议是确保您使用许多小功能.您会发现调试、测试和使用更容易.
I am new to clojure so please bear with me. I have a XML which looks like this
<?xml version="1.0" encoding="UTF-8"?>
<XVar Id="cdx9" Type="Dictionary">
<XVar Id="Base.AccruedPremium" Type="Multi" Value="" Rows="1" Columns="1">
<Row Id="0">
<Col Id="0" Type="Num" Value="0"/>
</Row>
</XVar>
<XVar Id="TrancheAnalysis.IndexDuration" Type="Multi" Value="" Rows="1" Columns="1">
<Row Id="0">
<Col Id="0" Type="Num" Value="3.4380728252313069"/>
</Row>
</XVar>
<XVar Id="TrancheAnalysis.IndexLevel01" Type="Multi" Value="" Rows="1" Columns="1">
<Row Id="0">
<Col Id="0" Type="Num" Value="30693.926279941188"/>
</Row>
</XVar>
<XVar Id="TrancheAnalysis.TrancheDelta" Type="Multi" Value="" Rows="1" Columns="1">
<Row Id="0">
<Col Id="0" Type="Num" Value="8.9304387917502073"/>
</Row>
</XVar>
<XVar Id="TrancheAnalysis.TrancheDuration" Type="Multi" Value="" Rows="1" Columns="1">
<Row Id="0">
<Col Id="0" Type="Num" Value="3.0775955481964035"/>
</Row>
</XVar>
</XVar>
And it repeats. From this I want to be able to produce a CSV file with these columns
IndexName,TrancheAnalysis.IndexDuration,TrancheAnalysis.TrancheDuration
cdx9,3.4380728252313069,3.0775955481964035
.........................................
.........................................
I am able to parse a simple XML file like
<?xml version="1.0" encoding="UTF-8"?>
<CalibrationData>
<IndexList>
<Index>
<Calibrate>Y</Calibrate>
<UseClientIndexQuotes>Y</UseClientIndexQuotes>
<IndexName>HYCDX10</IndexName>
<Tenor>06/20/2013</Tenor>
<TenorName>3Y</TenorName>
<IndexLevels>219.6</IndexLevels>
<Tranche>Equity0To0.15</Tranche>
<TrancheStart>0</TrancheStart>
<TrancheEnd>0.15</TrancheEnd>
<UseBreakEvenSpread>1</UseBreakEvenSpread>
<UseTlet>0</UseTlet>
<IsTlet>0</IsTlet>
<PctExpectedLoss>0</PctExpectedLoss>
<UpfrontFee>52.125</UpfrontFee>
<RunningFee>0</RunningFee>
<DeltaFee>5.3</DeltaFee>
<CentralCorrelation>0.1</CentralCorrelation>
<Currency>USD</Currency>
<RescalingMethod>PTIndexRescaling</RescalingMethod>
<EffectiveDate>06/17/2011</EffectiveDate>
</Index>
</IndexList>
</CalibrationData>
with this code
(ns DynamicProgramming
(:require [clojure.xml :as xml]))
;Get the Input Files
(def calibrationFile "C:/ashwani/Eclipse/HistoricalTrancheAnalysis/src/CalibrationQuotes.xml")
(def mktdataFile "C:/ashwani/Eclipse/HistoricalTrancheAnalysis/src/MarketData.xml")
(def sample "C:/ashwani/Eclipse/HistoricalTrancheAnalysis/src/Sample.xml")
;Parse the Calibration Input File
(def CalibOp (for [x
(xml-seq
(xml/parse (java.io.File. calibrationFile)))
:when (or
(= :IndexName (:tag x))
(= :Tenor (:tag x))
(= :UpfrontFee (:tag x))
(= :RunningFee (:tag x))
(= :DeltaFee (:tag x))
(= :IndexLevels (:tag x))
(= :TrancheStart (:tag x))
(= :TrancheEnd (:tag x))
)]
(first(:content x))))
(println CalibOp)
But the second XML is simple; on the other hand I don't know how to iterate through the nested structure of the first XML example and extract the information I want.
Any help will be great.
I would use data.zip (Formerly clojure.contrib.zip-filter). It provides a lot of xml-parsing power and it's easily capable of performing xpath like expressions. The README describes it as a System for filtering trees, and XML trees in particular.
Below I have some sample code for creating a "row" for the CSV file. The row is a map of the column name to the attribute value.
(ns work
(:require [clojure.xml :as xml]
[clojure.zip :as zip]
[clojure.contrib.zip-filter.xml :as zf]))
; create a zip from the xml file
(def zip (zip/xml-zip (xml/parse "data.xml")))
; pulls out a list of all of the root "Id" attribute values
(zf/xml-> zip (zf/attr :Id))
(defn value [xvar-zip]
"Finds the id and value for a particular element"
(let [id (-> xvar-zip zip/node :attrs :Id) ; manual access
value (zf/xml1-> xvar-zip ; use xpath like expression to pull value out
:Row ; need the row element
:Col ; then the column element
(zf/attr :Value))] ; and finally pull the Value out
{id value}))
; gets the "column-value" pair for a single column
(zf/xml1-> zip
(zf/attr= :Id "cdx9") ; filter on id "cdx9"
:XVar ; filter on XVars under it
(zf/attr= :Id "TrancheAnalysis.IndexDuration") ; filter on id
value) ; apply the value function on the result of above
; creates a map of every column key to it's corresponding value
(apply merge (zf/xml-> zip (zf/attr= :Id "cdx9") :XVar value))
I'm not sure how the xml would work with multiple Dictionary XVars, as it is a root element. If you need to, one of the other functions which is useful for this type of work is mapcat
, which cat
s all of the values returned from the mapping function.
There are some more examples in the test source as well.
One other big recommendation I have is to make sure you use a lot of small functions. You'll find things much easier to debug, test, and work with.
这篇关于在 clojure 中解析 XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!