HXT获得第一个要素:重构怪异的箭头 [英] HXT getting first element: refactor weird arrow

查看:82
本文介绍了HXT获得第一个要素:重构怪异的箭头的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要获取第一个<p>的文本内容,它是<div class="about">的子代,编写了以下代码:

tagTextS :: IOSArrow XmlTree String
tagTextS = getChildren >>> getText >>> arr stripString

parseDescription :: IOSArrow XmlTree String
parseDescription =
  (
   deep (isElem >>> hasName "div" >>> hasAttrValue "id" (== "company_about_full_description"))
   >>> (arr (\x -> x) /> isElem  >>> hasName "p") >. (!! 0) >>> tagTextS
  ) `orElse` (constA "")

看看这个arr (\x -> x) –没有它,我将无法达到结果.

  • 是否有更好的方式编写parseDescription?
  • 另一个问题 为什么在arr之前和hasName "p"之后需要括号? (一世 实际上找到了此解决方案这里)

解决方案

根据需要使用hxt核心的另一项提议.

要强制第一个孩子,不能通过 getChildren 输出来完成,因为hxt箭头具有特定的(>>>),可将后续箭头映射到优先输出的每个列表项,而不是输出列表. ,如 haskellWiki hxt页面所述,尽管这是一个旧定义,但实际上它源自类别(.)组成. /p> 可以从 getChildren

import Data.Tree.Class (Tree)
import qualified Data.Tree.Class as T

-- if the nth element does not exist it will return an empty children list

getNthChild :: (ArrowList a, Tree t) => Int -> a (t b) (t b)
getNthChild n = arrL (take 1 . drop n . T.getChildren)

然后您的parseDescription可以采用以下形式:

-- importing Text.XML.HXT.Arrow.XmlArrow (hasName, hasAttrValue)

parseDescription = 
    deep (isElem >>> hasName "div" >>> hasAttrValue "class" (== "about") 
          >>> getNthChild 0 >>> hasName "p"
          ) 
    >>> getChildren >>> getText

更新.我发现了使用 changeChildren 的另一种方法:

getNthChild :: (ArrowTree a, Tree t) => Int -> a (t b) (t b)
getNthChild n = changeChildren (take 1 . drop n) >>> getChildren

更新:避免元素间间距节点过滤非元素子元素

import qualified Text.XML.HXT.DOM.XmlNode as XN

getNthChild :: (ArrowTree a, Tree t, XN.XmlNode b) => Int -> a (t b) (t b)
getNthChild n = changeChildren (take 1 . drop n . filter XN.isElem) >>> getChildren

I need to get text contents of first <p> which is children of <div class="about">, wrote the following code:

tagTextS :: IOSArrow XmlTree String
tagTextS = getChildren >>> getText >>> arr stripString

parseDescription :: IOSArrow XmlTree String
parseDescription =
  (
   deep (isElem >>> hasName "div" >>> hasAttrValue "id" (== "company_about_full_description"))
   >>> (arr (\x -> x) /> isElem  >>> hasName "p") >. (!! 0) >>> tagTextS
  ) `orElse` (constA "")

Look at this arr (\x -> x) – without it I wasn't be able to reach result.

  • Is there a better way to write parseDescription?
  • Another question is why do I need parentheses before arr and after hasName "p"? (I actually found this solution here)

解决方案

Another proposal using hxt core as you demand.

To enforce the first child, cannot be done through getChildren output, since hxt arrows have a specific (>>>) that maps subsequent arrows to every list item of precedent output and not the output list, as explained in the haskellWiki hxt page although this is an old definition, actually it derives from Category (.) composition.

getNthChild can be hacked from getChildren of Control.Arrow.ArrowTree

import Data.Tree.Class (Tree)
import qualified Data.Tree.Class as T

-- if the nth element does not exist it will return an empty children list

getNthChild :: (ArrowList a, Tree t) => Int -> a (t b) (t b)
getNthChild n = arrL (take 1 . drop n . T.getChildren)

then your parseDescription could take this form:

-- importing Text.XML.HXT.Arrow.XmlArrow (hasName, hasAttrValue)

parseDescription = 
    deep (isElem >>> hasName "div" >>> hasAttrValue "class" (== "about") 
          >>> getNthChild 0 >>> hasName "p"
          ) 
    >>> getChildren >>> getText

Update. I found another way using changeChildren:

getNthChild :: (ArrowTree a, Tree t) => Int -> a (t b) (t b)
getNthChild n = changeChildren (take 1 . drop n) >>> getChildren

Update: avoid inter-element spacing-nodes filtering non-element children

import qualified Text.XML.HXT.DOM.XmlNode as XN

getNthChild :: (ArrowTree a, Tree t, XN.XmlNode b) => Int -> a (t b) (t b)
getNthChild n = changeChildren (take 1 . drop n . filter XN.isElem) >>> getChildren

这篇关于HXT获得第一个要素:重构怪异的箭头的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆