有了Haskell，我该如何处理大量的XML？ [英] With Haskell, how do I process large volumes of XML?

查看：218 发布时间：2018/6/4 15:52:17 xml haskell tag-soup large-scale large-data

本文介绍了有了Haskell，我该如何处理大量的XML？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我一直在探索堆栈溢出数据转储，从而利用友好XML和用正则表达式解析。我尝试使用各种Haskell XML库来查找特定用户按文档顺序排列的第一篇文章，所有这些都遇到了令人讨厌的颠簸。

TagSoup

I've been exploring the Stack Overflow data dumps and thus far taking advantage of the friendly XML and "parsing" with regular expressions. My attempts with various Haskell XML libraries to find the first post in document-order by a particular user all ran into nasty thrashing.

import Control.Monad
import Text.HTML.TagSoup

userid = "83805"

main = do
  posts <- liftM parseTags (readFile "posts.xml")
  print $ head $ map (fromAttrib "Id") $
                 filter (~== ("<row OwnerUserId=" ++ userid ++ ">"))
                 posts

hxt

import Text.XML.HXT.Arrow
import Text.XML.HXT.XPath

userid = "83805"

main = do
  runX $ readDoc "posts.xml" >>> posts >>> arr head
  where
    readDoc = readDocument [ (a_tagsoup, v_1)
                           , (a_parse_xml, v_1)
                           , (a_remove_whitespace, v_1)
                           , (a_issue_warnings, v_0)
                           , (a_trace, v_1)
                           ]

posts :: ArrowXml a => a XmlTree String
posts = getXPathTrees byUserId >>>
        getAttrValue "Id"
  where byUserId = "/posts/row/@OwnerUserId='" ++ userid ++ "'"

xml

import Control.Monad
import Control.Monad.Error
import Control.Monad.Trans.Maybe
import Data.Either
import Data.Maybe
import Text.XML.Light

userid = "83805"

main = do
  [posts,votes] <- forM ["posts", "votes"] $
    liftM parseXML . readFile . (++ ".xml")
  let ps = elemNamed "posts" posts
  putStrLn $ maybe "<not present>" show
           $ filterElement (byUser userid) ps

elemNamed :: String -> [Content] -> Element
elemNamed name = head . filter ((==name).qName.elName) . onlyElems

byUser :: String -> Element -> Bool
byUser id e = maybe False (==id) (findAttr creator e)
  where creator = QName "OwnerUserId" Nothing Nothing

我哪里错了？使用Haskell处理大量XML文档的正确方法是什么？

Where did I go wrong? What is the proper way to process hefty XML documents with Haskell?

有了Haskell，我该如何处理大量的XML？ [英] With Haskell, how do I process large volumes of XML?

问题描述

TagSoup

hxt

hxt

xml

xml

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

有了Haskell，我该如何处理大量的XML？ [英] With Haskell, how do I process large volumes of XML?

问题描述

TagSoup

hxt

hxt

xml

xml

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭