Parsec忽略除一个片段之外的所有内容 [英] Parsec ignore everything except one fragment

查看:208
本文介绍了Parsec忽略除一个片段之外的所有内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在一个格式不完整的HTML文档中解析单个选择标记(因此基于XML的解析器不起作用)。



我知道如何使用parsec解析select标签,但是如何在标签之前和之后跳过所有内容?



示例:

 < html> 
带有大量标签的随机内容...
< select id = something title =whatever>< option value = 1 selected> 1。第一个<选项值= 2> 2。第二及LT; /选择>
更随机的内容...
< / html>

这实际上是HTML在选择标记中的样子。我该怎么做Parsec,或者你会推荐我使用不同的库?

解决方案

以下是我该怎么做:

  solution =(do {
; string< tag-name
; x< ; - ⟦insertOptionsParserHere⟧
; char'>'
; return x
})< |> (anyChar>>解决方案)

这将递归地消耗字符,直到遇到< html>< / code>标签,它使用你的解析器,并在使用最后一个标签时留下递归。



明智地注意到,在&为了解决这个问题,我们可以这样做,只要你的解析器使用标签:

  solution =⟦insertHtmlParserHere⟧< |> ; (anyChar>>解决方案)

明确表示 ⟦insertHtmlParserHere⟧会有这样的结构:

 ⟦insertHtmlParserHere⟧= do 
字符串< tag-name

char'>'






作为一个侧面提示,如果你想捕捉所有可用的标签,你可以非常高兴地使用 many

  everyTag =许多解决方案


I need to parse a single select tag in a poorly formed HTML document (so XML-based parsers don't work).

I think I know how to use parsec to parse the select tag once I get there, but how do I skip all the stuff before and after that tag?

Example:

<html>
   random content with lots of tags...
   <select id=something title="whatever"><option value=1 selected>1. First<option value=2>2. Second</select>
   more random content...
</html>

That's actually what the HTML looks like in the select tag. How would I do this with Parsec, or would you recommend I use a different library?

解决方案

Here's how I'd do it:

solution = (do {
  ; string "<tag-name"
  ; x <- ⟦insertOptionsParserHere⟧
  ; char '>'
  ; return x
  }) <|> (anyChar >> solution)

This will recursively consume characters until it meets a starting <html> tag, upon which it uses your parser, and leaves the recursion on consuming a final tag.

It is wise to note that there may be trailing whitespace before & after To fix that, we could do this, providing your parser consumes the tags:

solution = ⟦insertHtmlParserHere⟧ <|> (anyChar >> solution)

To be clear that would mean that ⟦insertHtmlParserHere⟧ would have this kind of structure:

⟦insertHtmlParserHere⟧ = do
   string "<tag-name"
   ⋯
   char '>'


As a side-note, if you want to capture every tag available, you can quite happily use many:

everyTag = many solution

这篇关于Parsec忽略除一个片段之外的所有内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆