从子树中提取值 [英] Extracting Values from a Subtree
问题描述
我用 HXT
解析一个XML文件,我试图将一些节点提取分解为模块化的部分(我一直使用它作为我的指南)。不幸的是,一旦我进行了第一级解析,我无法弄清楚如何应用一些选择器。
import Text.XML.HXT.Core
let node tag = multi(hasName tag)
xml< - readFiletest.xml
let doc = readString [withValidate yes,withParseHTML no,withWarnings no] xml
books< - runX $ doc>>>节点book
我发现书籍的类型为 [XmlTree]
:t books
books :: [XmlTree]
现在我想获取 books
的第一个元素,然后提取一些子树内的值。
let b = head(books)
runX $ b>>>节点成本
无法与'IOSLA(XIOState())XmlTree'类型的'Data.Tree.NTree.TypeDefs.NTree'
匹配
预期类型:IOSLA (XIOState())XmlTree XNode
实际类型:XmlTree
在'(>>)'的第一个参数中,即'b'
在第二个参数' $)',即'b>>>节点成本'
我有一个 XmlTree
并且我正在显示上述不正确的用法来说明我想要的内容。我知道我可以这样做:
runX $ doc>>>节点书>>>节点成本/> getText
[55.9,95.0]
但我不仅对成本
,但还包含 book
中的更多元素。 XML文件非常深,所以我不想用< +>
来嵌套所有内容,而更多的评价者更喜欢提取我想要的块,然后提取子元素在一个单独的功能。
示例(编制)XML文件:
< ?xml version =1.0encoding =UTF-8?>< start xmlns =http://www.example.com/namespacexmlns:xsi =http://www.w3.org/ 2001 / XMLSchema的实例>
< books>
< book>
< author>
<名称>
< first> Joe< / first>
< last> Smith< / last>
< / name>
< city>纽约市< / city>
< / author>
<发布> 1990-11-15< /发布>
< isbn> 1234567890< / isbn>
< publisher> X发布者< / publisher>
<成本> 55.9< / cost>
< / book>
< book>
< author>
<名称>
< first> Jane< / first>
< last> Jones< / last>
< / name>
< city>旧金山< / city>
< / author>
<发布> 1999-01-19< /发布>
< isbn> 0987654321< / isbn>
< publisher> Y发布者< / publisher>
<成本> 95.0< / cost>
< / book>
< / books>
< / start>
有人可以帮我理解,如何提取的子元素
?理想情况下,使用>>>
和节点
之类的东西,这样我就可以定义我自己的函数,例如 getCost
, getName
等等,每一个都大致具有签名 XmlTree - > [String]
箭头基本上是函数。 IOStateArrow s b XmlTree
。你真的应该再次阅读你的指南,所有你想知道的是在标题避免IO。
SomeArrow a b
可以被看作是类型为 a - >的泛化/专用函数。 B'/ code>。
>>>
和范围中的其他操作符都是用于箭头组合的,类似于函数组合。你的 books
[XmlTree]
,所以它不是箭头,不能用箭头组成。满足你的需求的是 runLA
,它将节点tag
这样的箭头转换成一个普通函数:
模块Main其中
import Text.XML.HXT.Core
main =做
html< - readFiletest.xml
let doc = readString [withValidate yes,withParseHTML no,withWarnings no] html
books< - runX $ doc>>> ;节点book
- runLA(节点成本/> GT; getText):: XmlTree - > [String]
let cost = books>> = runLA(节点成本/> getText)
打印成本
节点标签=多(hasName标签)
I am parsing an XML file with HXT
and I am trying to break up some of the node extraction into modular pieces (I have been using this as my guide). Unfortunately, I cannot figure out how to apply some of the selectors once I do the first level parsing.
import Text.XML.HXT.Core
let node tag = multi (hasName tag)
xml <- readFile "test.xml"
let doc = readString [withValidate yes, withParseHTML no, withWarnings no] xml
books <- runX $ doc >>> node "book"
I see that books has a type [XmlTree]
:t books
books :: [XmlTree]
Now I would like to get the first element of books
and then extract some values inside the sub-tree.
let b = head(books)
runX $ b >>> node "cost"
Couldn't match type ‘Data.Tree.NTree.TypeDefs.NTree’
with ‘IOSLA (XIOState ()) XmlTree’
Expected type: IOSLA (XIOState ()) XmlTree XNode
Actual type: XmlTree
In the first argument of ‘(>>>)’, namely ‘b’
In the second argument of ‘($)’, namely ‘b >>> node "cost"’
I cannot find selectors once I have an XmlTree
and I am showing the above incorrect usage to illustrate what I would like to. I know I can do this:
runX $ doc >>> node "book" >>> node "cost" /> getText
["55.9","95.0"]
But I am not only interested in cost
but also many more elements inside book
. The XML file is pretty deep so I don't want to nest everything with <+>
and much rater prefer extract the chunk I want and then extract the sub-elements in a separate function.
Example (made-up) XML File:
<?xml version="1.0" encoding="UTF-8"?><start xmlns="http://www.example.com/namespace" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<books>
<book>
<author>
<name>
<first>Joe</first>
<last>Smith</last>
</name>
<city>New York City</city>
</author>
<released>1990-11-15</released>
<isbn>1234567890</isbn>
<publisher>X Publisher</publisher>
<cost>55.9</cost>
</book>
<book>
<author>
<name>
<first>Jane</first>
<last>Jones</last>
</name>
<city>San Francisco</city>
</author>
<released>1999-01-19</released>
<isbn>0987654321</isbn>
<publisher>Y Publisher</publisher>
<cost>95.0</cost>
</book>
</books>
</start>
Can someone help me understand, how to extract the sub-elements of book
? Ideally with something as nice as >>>
and node
so I can define my own functions such as getCost
, getName
, etc. that each will roughly have the signature XmlTree -> [String]
doc
is not what you thought it is. It has type IOStateArrow s b XmlTree
. You really should read your guide again, all you want to know was concluded under the title "Avoiding IO".
Arrows are basically functions. SomeArrow a b
can be considered as a generalized/specialized function of type a -> b
. >>>
and other operators in the scope are for arrow composition, similar to function composition. Your books
has type [XmlTree]
so it's not an arrow and cannot be composed with arrows. What fulfills your needs is runLA
, it transforms an arrow like node "tag"
to a normal function:
module Main where
import Text.XML.HXT.Core
main = do
html <- readFile "test.xml"
let doc = readString [withValidate yes, withParseHTML no, withWarnings no] html
books <- runX $ doc >>> node "book"
-- runLA (node "cost" /> getText) :: XmlTree -> [String]
let costs = books >>= runLA (node "cost" /> getText)
print costs
node tag = multi (hasName tag)
这篇关于从子树中提取值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!