从xml-conduit获取所有名称 [英] Get all Names from xml-conduit

查看:98
本文介绍了从xml-conduit获取所有名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从 http://hackage.haskell.org/package/xml-conduit-1.1.0.9/docs/Text-XML-Stream-Parse.html



下面是它的样子:

 <?xml version =1.0encoding =utf-8 ?> 
< population xmlns:xsd =http://www.w3.org/2001/XMLSchemaxmlns:xsi =http://www.w3.org/2001/XMLSchema-instancexmlns = http://example.com>
< success> true< / success>
< row_count> 2< / row_count>
< summary>
<香蕉> 0< /香蕉>
< / summary>
<人>
< person>
< firstname> Michael< / firstname>
< age> 25< / age>
< / person>
< person>
< firstname> Eliezer< / firstname>
< age> 2< / age>
< / person>
< / people>
< / population>

如何获得 firstname age 为每个人?



我的目标是使用http-conduit下载这个xml然后解析它,但我正在寻找一个解决方案,以便在没有属性的情况下解析(使用tagNoAttrs?)



这是我尝试过的,并且添加了我的Haskell评论中的问题:

  { - #LANGUAGE OverloadedStrings# - } 
import Control.Monad.Trans.Resource
import Data.Conduit(($$))
import Data.Text(Text,unpack)
import Text.XML.Stream.Parse
import Control.Applicative((< ; *))

data Person = Person Int Text
导出显示

- 是否需要将lambda函数\age更改为其他值才能获得姓名和年龄?
parsePerson = tagNoAttrperson$ \age - >做
名称< - 内容 - 如何从内容中获得年龄? unpack用于属性
返回$ Person年龄名称

parsePeople = tagNoAttrpeople$ many parsePerson

- 这不会忽略xmlns属性
parsePopulation = tagNamepopulation(optionalAttrxmlns< * ignoreAttrs)$ parsePeople

main = do
people< - runResourceT $
parseFile def people2.xml$$ parsePopulation
print people


解决方案

<首先:解析xml-conduit中的组合器在一段时间内没有更新,并显示它们的年龄。我建议大多数人使用DOM或游标界面。这就是说,让我们看看你的例子。你的代码有两个问题:


  • 它没有正确处理XML名称空间。所有元素名称都位于 http://example.com 命名空间中,并且您的代码需要反映它。

  • 解析组合器要求您考虑所有元素。他们不会自动跳过一些元素给你。
  • / b>

      { - #LANGUAGE OverloadedStrings# - } 
    import Control.Monad.Trans.Resource(runResourceT)
    import Data.Conduit(Consumer,($$))
    import Data.Text(Text)
    import Data.Text.Read(decimal)
    import Data.XML.Types(Event)
    import Text.XML.Stream.Parse

    data Person = Person Int Text
    导出显示

    - 是否需要更改lambda函数\\为了获得名称和年龄,还需要别的东西?
    parsePerson :: MonadThrow m =>消费者事件m(可能是人)
    parsePerson = tagNoAttr{http://example.com} person$ do
    name< - force名字标记缺失$ tagNoAttr{http:// example.com}名字内容
    ageText< - 强制缺少时间标记$ tagNoAttr{http://example.com}年龄内容
    小数
    小数ageText右(年龄,) - >返回$人年龄名称
    _ - >强制无效年龄值$ return Nothing

    parsePeople :: MonadThrow m => Consumer Event m [Person]
    parsePeople = forceno people tag$ do
    _< - tagNoAttr{http://example.com} successcontent
    _< - tagNoAttr{http://example.com} row_countcontent
    _< - tagNoAttr{http://example.com}摘要$
    tagNoAttr{http://example.com } bananascontent
    tagNoAttr{http://example.com} people$ many parsePerson

    - 这不会忽略xmlns属性
    parsePopulation :: MonadThrow m =>消费者事件m [Person]
    parsePopulation =强制缺少人口标签$
    tagName{http://example.com} populationignoreAttrs $ \() - > parsePeople
    $ b $ main main :: IO()
    main = do
    people< - runResourceT $
    parseFile defpeople2.xml$$ parsePopulation
    打印人员

    以下是使用游标API的示例。请注意,它具有不同的错误处理特性,但对于格式良好的输入应该产生相同的结果。

      { - #LANGUAGE OverloadedStrings # - } 
    导入Text.XML
    导入Text.XML.Cursor
    导入Data.Text(文本)
    导入Data.Text.Read(十进制)
    导入Data.Monoid(mconcat)

    main :: IO()
    main = do
    doc< - Text.XML.readFile defpeople2.xml
    let cursor = fromDocument doc
    print $ cursor $ // element{http://example.com} person> => parsePerson

    data Person = Person Int Text
    派生Show

    parsePerson :: Cursor - > [Person]
    parsePerson c = do
    let name = c $ / element{http://example.com} firstname& / content
    ageText = c $ / element{ http://example.com} age& / content
    case decimal $ mconcat ageText
    Right(age,) - > [人年龄$ mconcat名称]
    _ - > []


    I'm parsing a modified XML from http://hackage.haskell.org/package/xml-conduit-1.1.0.9/docs/Text-XML-Stream-Parse.html

    Here's what it looks like:

    <?xml version="1.0" encoding="utf-8"?>
    <population xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://example.com">
      <success>true</success>
      <row_count>2</row_count>
      <summary>
        <bananas>0</bananas>
      </summary>
      <people>
          <person>
              <firstname>Michael</firstname>
              <age>25</age>
          </person>
          <person>
              <firstname>Eliezer</firstname>
              <age>2</age>
          </person>
      </people>
    </population>
    

    How do I get a list of firstname and age for every person?

    My goal is to use http-conduit to download this xml and then parse it, but I am looking for a solution on how to parse when there are no attributes (use tagNoAttrs?)

    Here's what I've tried, and I've added my questions in the Haskell comments:

    {-# LANGUAGE OverloadedStrings #-}
    import Control.Monad.Trans.Resource
    import Data.Conduit (($$))
    import Data.Text (Text, unpack)
    import Text.XML.Stream.Parse
    import Control.Applicative ((<*))
    
    data Person = Person Int Text
            deriving Show
    
    -- Do I need to change the lambda function \age to something else to get both name and age?
    parsePerson = tagNoAttr "person" $ \age -> do
            name <- content  -- How do I get age from the content?  "unpack" is for attributes
            return $ Person age name
    
    parsePeople = tagNoAttr "people" $ many parsePerson
    
    -- This doesn't ignore the xmlns attributes
    parsePopulation  = tagName "population" (optionalAttr "xmlns" <* ignoreAttrs) $ parsePeople
    
    main = do
            people <- runResourceT $
                 parseFile def "people2.xml" $$ parsePopulation
            print people
    

    解决方案

    Firstly: parsing combinators in xml-conduit haven't been updated in quite a while, and show their age. I recommend most people to use the DOM or cursor interface instead. That said, let's look at your example. There are two problems with your code:

    • It doesn't properly handle XML namespaces. All of the element names are in the http://example.com namespace, and your code needs to reflect that.
    • The parsing combinators demand that you account for all elements. They won't automatically skip over some elements for you.

    So here's an implementation using the streaming API that gets the desired result:

    {-# LANGUAGE OverloadedStrings #-}
    import           Control.Monad.Trans.Resource (runResourceT)
    import           Data.Conduit                 (Consumer, ($$))
    import           Data.Text                    (Text)
    import           Data.Text.Read               (decimal)
    import           Data.XML.Types               (Event)
    import           Text.XML.Stream.Parse
    
    data Person = Person Int Text
            deriving Show
    
    -- Do I need to change the lambda function \age to something else to get both name and age?
    parsePerson :: MonadThrow m => Consumer Event m (Maybe Person)
    parsePerson = tagNoAttr "{http://example.com}person" $ do
            name <- force "firstname tag missing" $ tagNoAttr "{http://example.com}firstname" content
            ageText <- force "age tag missing" $ tagNoAttr "{http://example.com}age" content
            case decimal ageText of
                Right (age, "") -> return $ Person age name
                _ -> force "invalid age value" $ return Nothing
    
    parsePeople :: MonadThrow m => Consumer Event m [Person]
    parsePeople = force "no people tag" $ do
        _ <- tagNoAttr "{http://example.com}success" content
        _ <- tagNoAttr "{http://example.com}row_count" content
        _ <- tagNoAttr "{http://example.com}summary" $
            tagNoAttr "{http://example.com}bananas" content
        tagNoAttr "{http://example.com}people" $ many parsePerson
    
    -- This doesn't ignore the xmlns attributes
    parsePopulation :: MonadThrow m => Consumer Event m [Person]
    parsePopulation = force "population tag missing" $
        tagName "{http://example.com}population" ignoreAttrs $ \() -> parsePeople
    
    main :: IO ()
    main = do
            people <- runResourceT $
                 parseFile def "people2.xml" $$ parsePopulation
            print people
    

    Here's an example using the cursor API. Note that it has different error handling characteristics, but should produce the same result for well-formed input.

    {-# LANGUAGE OverloadedStrings #-}
    import Text.XML
    import Text.XML.Cursor
    import Data.Text (Text)
    import Data.Text.Read (decimal)
    import Data.Monoid (mconcat)
    
    main :: IO ()
    main = do
        doc <- Text.XML.readFile def "people2.xml"
        let cursor = fromDocument doc
        print $ cursor $// element "{http://example.com}person" >=> parsePerson
    
    data Person = Person Int Text
            deriving Show
    
    parsePerson :: Cursor -> [Person]
    parsePerson c = do
        let name = c $/ element "{http://example.com}firstname" &/ content
            ageText = c $/ element "{http://example.com}age" &/ content
        case decimal $ mconcat ageText of
            Right (age, "") -> [Person age $ mconcat name]
            _ -> []
    

    这篇关于从xml-conduit获取所有名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆