如何跳过xml-conduit中的元素 [英] How to skip elements in xml-conduit

查看:37
本文介绍了如何跳过xml-conduit中的元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须处理相当大的XML文件,并且我想使用 xml-conduit 的流式API遍历它们并提取所需的信息.就我而言,使用流 xml-conduit 尤其吸引人,因为我不需要这些文件中的大量数据,而且我需要对其进行简单的聚合,因此管道是完美的.

I have to handle rather big XML files and I want to use the streaming API of xml-conduit to go through them and extract the info I need. In my case using streaming xml-conduit is especially appealing because I don't need much data from these files, and I need to perform simple aggregations on it so conduits are perfect.

现在,我并不总是知道文件的确切结构.文件是由世界各地不同版本的软件(有时是错误的软件)生成的,因此我无法强加该模式.

Now, I don't always know the exact structure of the file. Files are generated by different versions of (sometimes buggy) software around the world so I can't impose the schema.

但是,我知道我感兴趣的元素及其形状.但是,正如我所说,这些元素可以与其他元素等以不同的顺序放置.

I know, however, elements that I am interested in, and their shapes. But, as I said, these elements can be located in different order with other elements, etc.

我想,我需要的只是跳过我不感兴趣的所有元素,而只考虑需要的元素.

What I need, I guess, is just to skip all the elements I am not interested in and only to consider ones that want.

我最初想写这样的东西:

I initially wanted to write something like that:

tagName "person" (requireAttr "age" <* ignoreAttrs) <|> ignoreTag (const True)

但是它不会编译,因为 ignoreType 返回 Maybe()

but it wouldn't compile because ignoreType returns Maybe ()

使用 xml-conduit 流式API时跳过所有未知"标签的方法是什么?

What would be the way to skip all the "unknown" tags when using xml-conduit streaming API?

推荐答案

如建议的此处

λ> runConduit $ Text.XML.Stream.Parse.parseLBS def  "<foo>bar</foo><person age=\"25\">Michael</person><person age=\"2\">Eliezer</person>" .| many_ (choose [takeTree "person" ignoreAttrs, ignoreAnyTreeContent]) .| manyYield parsePerson .| Data.Conduit.List.consume 
[Person 25 "Michael",Person 2 "Eliezer"]

这篇关于如何跳过xml-conduit中的元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆