在clojure中解析XML [英] parsing XML in clojure

查看:117
本文介绍了在clojure中解析XML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是clojure的新人,所以请和我一起玩。我有一个像这样的XML

 <?xml version =1.0encoding =UTF-8?> ; 
< XVar Id =cdx9Type =Dictionary>
< XVar Id =Base.AccrumentPremiumType =MultiValue =Rows =1Columns =1>
< Row Id =0>
< Col Id =0Type =NumValue =0/>
< / Row>
< / XVar>
< XVar Id =TrancheAnalysis.IndexDurationType =MultiValue =Rows =1Columns =1>
< Row Id =0>
< Col Id =0Type =NumValue =3.4380728252313069/>
< / Row>
< / XVar>
< XVar Id =TrancheAnalysis.IndexLevel01Type =MultiValue =Rows =1Columns =1>
< Row Id =0>
< Col Id =0Type =NumValue =30693.926279941188/>
< / Row>
< / XVar>
< XVar Id =TrancheAnalysis.TrancheDeltaType =MultiValue =Rows =1Columns =1>
< Row Id =0>
< Col Id =0Type =NumValue =8.9304387917502073/>
< / Row>
< / XVar>
< XVar Id =TrancheAnalysis.TrancheDurationType =MultiValue =Rows =1Columns =1>
< Row Id =0>
< Col Id =0Type =NumValue =3.0775955481964035/>
< / Row>
< / XVar>
< / XVar>

这重复了
从这里我想能够生成一个CSV文件列



IndexName,TrancheAnalysis.IndexDuration,TrancheAnalysis.TrancheDuration
cdx9,3.4380728252313069,3.0775955481964035
............ .........................
................. ........................



我可以解析一个简单的XML文件,例如

 <?xml version =1.0encoding =UTF-8?& 
< CalibrationData>
< IndexList>
< Index>
< Calibrate> Y< / Calibrate>
< UseClientIndexQuotes> Y< / UseClientIndexQuotes>
< IndexName> HYCDX10< / IndexName>
< Tenor> 06/20/2013< / Tenor>
< TenorName> 3Y< / TenorName>
< IndexLevels> 219.6< / IndexLevels>
< Tranche> Equity0To0.15< / Tranche>
< TrancheStart> 0< / TrancheStart>
< TrancheEnd> 0.15< / TrancheEnd>
< UseBreakEvenSpread> 1< / UseBreakEvenSpread>
< UseTlet> 0< / UseTlet>
< IsTlet> 0< / IsTlet>
< PctExpectedLoss> 0< / PctExpectedLoss>
< UpfrontFe> 52.125< / UpfrontFee>
< RunningFee> 0< / RunningFee>
< DeltaFee> 5.3< / DeltaFee>
< CentralCorrelation> 0.1< / CentralCorrelation>
<货币>美元< /货币>
< RescalingMethod> PTIndexRescaling< / RescalingMethod>
< EffectiveDate> 06/17/2011< / EffectiveDate>
< / Index>
< / Index>
< / IndexList>
< / CalibrationData>

使用此

 code>(ns DynamicProgramming 
(:require [clojure.xml:as xml]))
;获取输入文件
(def calibrationFileC:/ ashwani / Eclipse / HistoricalTrancheAnalysis /src/CalibrationQuotes.xml)
(def mktdataFileC:/ashwani/Eclipse/HistoricalTrancheAnalysis/src/MarketData.xml)
(def sampleC:/ ashwani / Eclipse / HistoricalTrancheAnalysis / src /Sample.xml)

;解析校准输入文件
(def CalibOp(for [x
(xml-seq
(xml / parse io.File。calibratFile)))
:when(or
(=:IndexName(:tag x))
(=:Tenor(:tag x))
:UpfrontFee(:tag x))
(=:RunningFee(:tag x))
(=:DeltaFee(:tag x))
(=:IndexLevels(:tag x))
(=:TrancheStart(:tag x))
(=:TrancheEnd(:tag x))
)]
但是第一个XML并不复杂,我真的不知道我迭代什么。通过嵌套结构和拉出信息。



任何帮助将是伟大的/

解决方案

我会使用 data.zip (以前是clojure.contrib.zip过滤器)。它提供了很多xml解析能力,它很容易执行类似xpath的表达式。 README将它描述为用于过滤树和特别是XML树的系统



下面我有一些示例代码用于创建一个 用于CSV文件。该行是列名称与属性值的映射。

 (ns work 
(:require [clojure .xml:as xml]
[clojure.zip:as zip]
[clojure.contrib.zip-filter.xml:as zf]))

;从xml文件中创建一个zip文件
(def zip(zip / xml-zip(xml / parsedata.xml)))

;拉出所有根Id属性值的列表
(zf / xml-> zip(zf / attr:Id))

(defn value [xvar-zip]
查找特定元素的id和值
(let [id( - > xvar-zip zip / node:attrs:Id);手动访问
value(zf / xml1 - > xvar-zip;使用xpath像表达式拉出值
:Row;需要行元素
:Col;然后列元素
(zf / attr:Value) ;最后拉出Value out
{id value}))

;获取单列的column-value对
(zf / xml1-> zip
(zf / attr =:Idcdx9); idcdx9 :XVar; filter on XVars under it
(zf / attr =:IdTrancheAnalysis.IndexDuration); filter on id
value);将值函数应用于上面的结果

;创建每个列键的映射到它对应的值
(apply merge(zf / xml-> zip(zf / attr =:Idcdx9):XVar value))

我不知道xml如何使用多个Dictionary XVars,因为它是一个根元素。如果你需要,对这种类型的工作有用的其他函数之一是 mapcat ,其中 cat s所有从映射函数返回的值。



测试来源



另一个大的建议是,确保你使用很多小功能。你会发现更容易调试,测试和使用。


I am new to clojure so please bear with me. I have a XML which looks like this

<?xml version="1.0" encoding="UTF-8"?>
<XVar Id="cdx9" Type="Dictionary">
<XVar Id="Base.AccruedPremium" Type="Multi" Value="" Rows="1" Columns="1">
<Row Id="0">
<Col Id="0" Type="Num" Value="0"/>
</Row>
</XVar>
<XVar Id="TrancheAnalysis.IndexDuration" Type="Multi" Value="" Rows="1" Columns="1">
<Row Id="0">
<Col Id="0" Type="Num" Value="3.4380728252313069"/>
</Row>
</XVar>
<XVar Id="TrancheAnalysis.IndexLevel01" Type="Multi" Value="" Rows="1" Columns="1">
<Row Id="0">
<Col Id="0" Type="Num" Value="30693.926279941188"/>
</Row>
</XVar>
<XVar Id="TrancheAnalysis.TrancheDelta" Type="Multi" Value="" Rows="1" Columns="1">
<Row Id="0">
<Col Id="0" Type="Num" Value="8.9304387917502073"/>
</Row>
</XVar>
<XVar Id="TrancheAnalysis.TrancheDuration" Type="Multi" Value="" Rows="1" Columns="1">
<Row Id="0">
<Col Id="0" Type="Num" Value="3.0775955481964035"/>
</Row>
</XVar>
</XVar>

And this repeats From this I want to be able to produce a CSV file with these column

IndexName,TrancheAnalysis.IndexDuration,TrancheAnalysis.TrancheDuration cdx9,3.4380728252313069,3.0775955481964035 ......................................... .........................................

I am able to parse a simple XML file like

<?xml version="1.0" encoding="UTF-8"?>
<CalibrationData>
<IndexList>
<Index>
<Calibrate>Y</Calibrate>
<UseClientIndexQuotes>Y</UseClientIndexQuotes>
<IndexName>HYCDX10</IndexName>
<Tenor>06/20/2013</Tenor>
<TenorName>3Y</TenorName>
<IndexLevels>219.6</IndexLevels>
<Tranche>Equity0To0.15</Tranche>
<TrancheStart>0</TrancheStart>
<TrancheEnd>0.15</TrancheEnd>
<UseBreakEvenSpread>1</UseBreakEvenSpread>
<UseTlet>0</UseTlet>
<IsTlet>0</IsTlet>
<PctExpectedLoss>0</PctExpectedLoss>
<UpfrontFee>52.125</UpfrontFee>
<RunningFee>0</RunningFee>
<DeltaFee>5.3</DeltaFee>
<CentralCorrelation>0.1</CentralCorrelation>
<Currency>USD</Currency>
<RescalingMethod>PTIndexRescaling</RescalingMethod>
<EffectiveDate>06/17/2011</EffectiveDate>
</Index>
</Index>
</IndexList>
</CalibrationData>

using this

(ns DynamicProgramming
  (:require [clojure.xml :as xml]))
;Get the Input Files
(def calibrationFile "C:/ashwani/Eclipse/HistoricalTrancheAnalysis/src/CalibrationQuotes.xml")
(def mktdataFile "C:/ashwani/Eclipse/HistoricalTrancheAnalysis/src/MarketData.xml")
(def sample "C:/ashwani/Eclipse/HistoricalTrancheAnalysis/src/Sample.xml")

;Parse the Calibration Input File
    (def CalibOp (for [x 
                  (xml-seq 
                    (xml/parse (java.io.File. calibrationFile)))
          :when (or 
                  (= :IndexName (:tag x)) 
                  (= :Tenor (:tag x))
                  (= :UpfrontFee (:tag x))
                  (= :RunningFee (:tag x))
                  (= :DeltaFee (:tag x))
                  (= :IndexLevels (:tag x))
                  (= :TrancheStart (:tag x))
                  (= :TrancheEnd (:tag x))
                 )]
    (first(:content x))))
    (println  CalibOp)

But the first XML is little complicate and I dont really know who I iterate through the nested structure and Pull out the info.

Any help will be great/

解决方案

I would use data.zip (Formerly clojure.contrib.zip-filter). It provides a lot of xml-parsing power and it's easily capable of performing xpath like expressions. The README describes it as a System for filtering trees, and XML trees in particular.

Below I have some sample code for creating a "row" for the CSV file. The row is a map of the column name to the attribute value.

(ns work 
    (:require [clojure.xml :as xml]
              [clojure.zip :as zip]
              [clojure.contrib.zip-filter.xml :as zf]))

; create a zip from the xml file
(def zip (zip/xml-zip (xml/parse "data.xml")))

; pulls out a list of all of the root "Id" attribute values
(zf/xml-> zip (zf/attr :Id))

(defn value [xvar-zip]
  "Finds the id and value for a particular element"
  (let [id (-> xvar-zip zip/node :attrs :Id) ; manual access
        value (zf/xml1-> xvar-zip ; use xpath like expression to pull value out
                         :Row ; need the row element
                         :Col ; then the column element
                         (zf/attr :Value))] ; and finally pull the Value out
    {id value}))

; gets the "column-value" pair for a single column
(zf/xml1-> zip
           (zf/attr= :Id "cdx9") ; filter on id "cdx9" 
           :XVar ; filter on XVars under it 
           (zf/attr= :Id "TrancheAnalysis.IndexDuration") ; filter on id
           value) ; apply the value function on the result of above

; creates a map of every column key to it's corresponding value
(apply merge (zf/xml-> zip (zf/attr= :Id "cdx9") :XVar value))

I'm not sure how the xml would work with multiple Dictionary XVars, as it is a root element. If you need to, one of the other functions which is useful for this type of work is mapcat, which cats all of the values returned from the mapping function.

There are some more examples in the test source as well.

One other big recommendation I have is to make sure you use a lot of small functions. You'll find things much easier to debug, test, and work with.

这篇关于在clojure中解析XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆