如何跳过xml-conduit中的元素 [英] How to skip elements in xml-conduit
问题描述
我必须处理相当大的XML文件,并且我想使用 xml-conduit
的流式API遍历它们并提取所需的信息.就我而言,使用流 xml-conduit
尤其吸引人,因为我不需要这些文件中的大量数据,而且我需要对其进行简单的聚合,因此管道是完美的.
I have to handle rather big XML files and I want to use the streaming API of xml-conduit
to go through them and extract the info I need.
In my case using streaming xml-conduit
is especially appealing because I don't need much data from these files, and I need to perform simple aggregations on it so conduits are perfect.
现在,我并不总是知道文件的确切结构.文件是由世界各地不同版本的软件(有时是错误的软件)生成的,因此我无法强加该模式.
Now, I don't always know the exact structure of the file. Files are generated by different versions of (sometimes buggy) software around the world so I can't impose the schema.
但是,我知道我感兴趣的元素及其形状.但是,正如我所说,这些元素可以与其他元素等以不同的顺序放置.
I know, however, elements that I am interested in, and their shapes. But, as I said, these elements can be located in different order with other elements, etc.
我想,我需要的只是跳过我不感兴趣的所有元素,而只考虑需要的元素.
What I need, I guess, is just to skip all the elements I am not interested in and only to consider ones that want.
我最初想写这样的东西:
I initially wanted to write something like that:
tagName "person" (requireAttr "age" <* ignoreAttrs) <|> ignoreTag (const True)
但是它不会编译,因为 ignoreType
返回 Maybe()
but it wouldn't compile because ignoreType
returns Maybe ()
使用 xml-conduit
流式API时跳过所有未知"标签的方法是什么?
What would be the way to skip all the "unknown" tags when using xml-conduit
streaming API?
推荐答案
如建议的此处
λ> runConduit $ Text.XML.Stream.Parse.parseLBS def "<foo>bar</foo><person age=\"25\">Michael</person><person age=\"2\">Eliezer</person>" .| many_ (choose [takeTree "person" ignoreAttrs, ignoreAnyTreeContent]) .| manyYield parsePerson .| Data.Conduit.List.consume
[Person 25 "Michael",Person 2 "Eliezer"]
这篇关于如何跳过xml-conduit中的元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!