在 clojure 中解析 XML [英] parsing XML in clojure

查看:15
本文介绍了在 clojure 中解析 XML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 clojure 的新手,所以请耐心等待.我有一个像这样的 XML

<Row Id="0"><Col Id="0" Type="Num" Value="0"/></行></XVar><XVar Id="TrancheAnalysis.IndexDuration" Type="Multi" Value="" Rows="1" Columns="1"><Row Id="0"><Col Id="0" Type="Num" Value="3.4380728252313069"/></行></XVar><XVar Id="TrancheAnalysis.IndexLevel01" Type="Multi" Value="" Rows="1" Columns="1"><Row Id="0"><Col Id="0" Type="Num" Value="30693.926279941188"/></行></XVar><XVar Id="TrancheAnalysis.TrancheDelta" Type="Multi" Value="" Rows="1" Columns="1"><Row Id="0"><Col Id="0" Type="Num" Value="8.9304387917502073"/></行></XVar><XVar Id="TrancheAnalysis.TrancheDuration" Type="Multi" Value="" Rows="1" Columns="1"><Row Id="0"><Col Id="0" Type="Num" Value="3.0775955481964035"/></行></XVar></XVar>

它会重复.由此,我希望能够生成包含这些列的 CSV 文件

IndexName,TrancheAnalysis.IndexDuration,TrancheAnalysis.TrancheDurationcdx9,3.4380728252313069,3.0775955481964035………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………

我能够解析像

这样的简单 XML 文件

使用此代码

(ns 动态规划(:require [clojure.xml :as xml]));获取输入文件(定义校准文件C:/ashwani/Eclipse/HistoricalTrancheAnalysis/src/CalibrationQuotes.xml")(def mktdataFile "C:/ashwani/Eclipse/HistoricalTrancheAnalysis/src/MarketData.xml")(定义示例C:/ashwani/Eclipse/HistoricalTrancheAnalysis/src/Sample.xml");解析校准输入文件(def CalibOp (对于 [x(xml-seq(xml/parse (java.io.File.calibrationFile))):当(或(= :IndexName (:tag x))(= : Tenor (:tag x))(= :UpfrontFee (:tag x))(= :RunningFee (:tag x))(= :DeltaFee (:tag x))(= :IndexLevels (:tag x))(= :TrancheStart (:tag x))(= :TrancheEnd (:tag x)))](first(:content x))))(打印 CalibOp)

但是第二个 XML 很简单;另一方面,我不知道如何遍历第一个 XML 示例的嵌套结构并提取我想要的信息.

任何帮助都会很棒.

解决方案

我会使用 data.zip(以前是 clojure.contrib.zip-filter).它提供了很多 xml 解析能力,并且可以轻松地执行类似 xpath 的表达式.自述文件将其描述为过滤树的系统,特别是 XML 树.

下面我有一些示例代码,用于为 CSV 文件创建行".行是列名到属性值的映射.

(ns 工作(:require [clojure.xml :as xml][clojure.zip :as zip][clojure.contrib.zip-filter.xml :as zf]));从 xml 文件创建一个 zip(def zip (zip/xml-zip (xml/parse "data.xml")));拉出所有根Id"属性值的列表(zf/xml-> zip (zf/attr:Id))(定义值 [xvar-zip]查找特定元素的 id 和值"(let [id (-> xvar-zip zip/node :attrs :Id) ; 手动访问值 (zf/xml1-> xvar-zip ; 使用类似 xpath 的表达式来提取值:排 ;需要行元素:Col;然后是列元素(zf/attr :Value))] ;最后拉出价值{id 值}));获取单列的列值"对(zf/xml1-> zip(zf/attr= :Id "cdx9") ;过滤 IDcdx9":XVar ;过滤它下的 XVars(zf/attr= :Id "TrancheAnalysis.IndexDuration") ;过滤 id价值) ;对上面的结果应用价值函数;创建每个列键到其对应值的映射(应用合并(zf/xml-> zip (zf/attr= :Id "cdx9") :XVar 值))

我不确定 xml 将如何与多个 Dictionary XVars 一起使用,因为它是一个根元素.如果需要,对此类工作有用的其他函数之一是 mapcat,它 cat 是从映射函数返回的所有值.>

测试源以及.

我的另一个重要建议是确保您使用许多小功能.您会发现调试、测试和使用更容易.

I am new to clojure so please bear with me. I have a XML which looks like this

<?xml version="1.0" encoding="UTF-8"?>
<XVar Id="cdx9" Type="Dictionary">
  <XVar Id="Base.AccruedPremium" Type="Multi" Value="" Rows="1" Columns="1">
    <Row Id="0">
      <Col Id="0" Type="Num" Value="0"/>
    </Row>
  </XVar>
  <XVar Id="TrancheAnalysis.IndexDuration" Type="Multi" Value="" Rows="1" Columns="1">
    <Row Id="0">
      <Col Id="0" Type="Num" Value="3.4380728252313069"/>
    </Row>
  </XVar>
  <XVar Id="TrancheAnalysis.IndexLevel01" Type="Multi" Value="" Rows="1" Columns="1">
    <Row Id="0">
      <Col Id="0" Type="Num" Value="30693.926279941188"/>
    </Row>
  </XVar>
  <XVar Id="TrancheAnalysis.TrancheDelta" Type="Multi" Value="" Rows="1" Columns="1">
    <Row Id="0">
      <Col Id="0" Type="Num" Value="8.9304387917502073"/>
    </Row>
  </XVar>
  <XVar Id="TrancheAnalysis.TrancheDuration" Type="Multi" Value="" Rows="1" Columns="1">
    <Row Id="0">
      <Col Id="0" Type="Num" Value="3.0775955481964035"/>
    </Row>
  </XVar>
</XVar>

And it repeats. From this I want to be able to produce a CSV file with these columns

IndexName,TrancheAnalysis.IndexDuration,TrancheAnalysis.TrancheDuration
cdx9,3.4380728252313069,3.0775955481964035
.........................................
.........................................

I am able to parse a simple XML file like

<?xml version="1.0" encoding="UTF-8"?>
<CalibrationData>
  <IndexList>
    <Index>
      <Calibrate>Y</Calibrate>
      <UseClientIndexQuotes>Y</UseClientIndexQuotes>
      <IndexName>HYCDX10</IndexName>
      <Tenor>06/20/2013</Tenor>
      <TenorName>3Y</TenorName>
      <IndexLevels>219.6</IndexLevels>
      <Tranche>Equity0To0.15</Tranche>
      <TrancheStart>0</TrancheStart>
      <TrancheEnd>0.15</TrancheEnd>
      <UseBreakEvenSpread>1</UseBreakEvenSpread>
      <UseTlet>0</UseTlet>
      <IsTlet>0</IsTlet>
      <PctExpectedLoss>0</PctExpectedLoss>
      <UpfrontFee>52.125</UpfrontFee>
      <RunningFee>0</RunningFee>
      <DeltaFee>5.3</DeltaFee>
      <CentralCorrelation>0.1</CentralCorrelation>
      <Currency>USD</Currency>
      <RescalingMethod>PTIndexRescaling</RescalingMethod>
      <EffectiveDate>06/17/2011</EffectiveDate>
    </Index>
  </IndexList>
</CalibrationData>

with this code

(ns DynamicProgramming
  (:require [clojure.xml :as xml]))
;Get the Input Files
(def calibrationFile "C:/ashwani/Eclipse/HistoricalTrancheAnalysis/src/CalibrationQuotes.xml")
(def mktdataFile "C:/ashwani/Eclipse/HistoricalTrancheAnalysis/src/MarketData.xml")
(def sample "C:/ashwani/Eclipse/HistoricalTrancheAnalysis/src/Sample.xml")

;Parse the Calibration Input File
    (def CalibOp (for [x 
                  (xml-seq 
                    (xml/parse (java.io.File. calibrationFile)))
          :when (or 
                  (= :IndexName (:tag x)) 
                  (= :Tenor (:tag x))
                  (= :UpfrontFee (:tag x))
                  (= :RunningFee (:tag x))
                  (= :DeltaFee (:tag x))
                  (= :IndexLevels (:tag x))
                  (= :TrancheStart (:tag x))
                  (= :TrancheEnd (:tag x))
                 )]
    (first(:content x))))
    (println  CalibOp)

But the second XML is simple; on the other hand I don't know how to iterate through the nested structure of the first XML example and extract the information I want.

Any help will be great.

解决方案

I would use data.zip (Formerly clojure.contrib.zip-filter). It provides a lot of xml-parsing power and it's easily capable of performing xpath like expressions. The README describes it as a System for filtering trees, and XML trees in particular.

Below I have some sample code for creating a "row" for the CSV file. The row is a map of the column name to the attribute value.

(ns work 
    (:require [clojure.xml :as xml]
              [clojure.zip :as zip]
              [clojure.contrib.zip-filter.xml :as zf]))

; create a zip from the xml file
(def zip (zip/xml-zip (xml/parse "data.xml")))

; pulls out a list of all of the root "Id" attribute values
(zf/xml-> zip (zf/attr :Id))

(defn value [xvar-zip]
  "Finds the id and value for a particular element"
  (let [id (-> xvar-zip zip/node :attrs :Id) ; manual access
        value (zf/xml1-> xvar-zip ; use xpath like expression to pull value out
                         :Row ; need the row element
                         :Col ; then the column element
                         (zf/attr :Value))] ; and finally pull the Value out
    {id value}))

; gets the "column-value" pair for a single column
(zf/xml1-> zip
           (zf/attr= :Id "cdx9") ; filter on id "cdx9" 
           :XVar ; filter on XVars under it 
           (zf/attr= :Id "TrancheAnalysis.IndexDuration") ; filter on id
           value) ; apply the value function on the result of above

; creates a map of every column key to it's corresponding value
(apply merge (zf/xml-> zip (zf/attr= :Id "cdx9") :XVar value))

I'm not sure how the xml would work with multiple Dictionary XVars, as it is a root element. If you need to, one of the other functions which is useful for this type of work is mapcat, which cats all of the values returned from the mapping function.

There are some more examples in the test source as well.

One other big recommendation I have is to make sure you use a lot of small functions. You'll find things much easier to debug, test, and work with.

这篇关于在 clojure 中解析 XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆