Parsec忽略除一个片段之外的所有内容 [英] Parsec ignore everything except one fragment

查看：208 发布时间：2018/6/5 11:55:01 html haskell parsec

本文介绍了Parsec忽略除一个片段之外的所有内容的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要在一个格式不完整的HTML文档中解析单个选择标记（因此基于XML的解析器不起作用）。

我知道如何使用parsec解析select标签，但是如何在标签之前和之后跳过所有内容？

示例：

 < html> 
带有大量标签的随机内容... 
< select id = something title =whatever>< option value = 1 selected> 1。第一个<选项值= 2> 2。第二及LT; /选择> 
更随机的内容... 
< / html>

这实际上是HTML在选择标记中的样子。我该怎么做Parsec，或者你会推荐我使用不同的库？

解决方案

以下是我该怎么做：

  solution =（do {
; string< tag-name
; x< ;  - ⟦insertOptionsParserHere⟧
; char'>'
; return x 
}）< |> （anyChar>>解决方案）

这将递归地消耗字符，直到遇到< html>< / code>标签，它使用你的解析器，并在使用最后一个标签时留下递归。

明智地注意到，在&为了解决这个问题，我们可以这样做，只要你的解析器使用标签： solution =⟦insertHtmlParserHere⟧< |> ; （anyChar>>解决方案）明确表示 ⟦insertHtmlParserHere⟧会有这样的结构： ⟦insertHtmlParserHere⟧= do 字符串< tag-name ⋯ char'>' 作为一个侧面提示，如果你想捕捉所有可用的标签，你可以非常高兴地使用 many ： everyTag =许多解决方案 I need to parse a single select tag in a poorly formed HTML document (so XML-based parsers don't work). I think I know how to use parsec to parse the select tag once I get there, but how do I skip all the stuff before and after that tag? Example: <html> random content with lots of tags... <select id=something title="whatever"><option value=1 selected>1. First<option value=2>2. Second</select> more random content... </html> That's actually what the HTML looks like in the select tag. How would I do this with Parsec, or would you recommend I use a different library? 解决方案 Here's how I'd do it: solution = (do { ; string "<tag-name" ; x <- ⟦insertOptionsParserHere⟧ ; char '>' ; return x }) <|> (anyChar >> solution) This will recursively consume characters until it meets a starting <html> tag, upon which it uses your parser, and leaves the recursion on consuming a final tag. It is wise to note that there may be trailing whitespace before & after To fix that, we could do this, providing your parser consumes the tags: solution = ⟦insertHtmlParserHere⟧ <|> (anyChar >> solution) To be clear that would mean that ⟦insertHtmlParserHere⟧ would have this kind of structure: ⟦insertHtmlParserHere⟧ = do string "<tag-name" ⋯ char '>' As a side-note, if you want to capture every tag available, you can quite happily use many: everyTag = many solution 这篇关于Parsec忽略除一个片段之外的所有内容的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Parsec忽略除一个片段之外的所有内容 [英] Parsec ignore everything except one fragment

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

Parsec忽略除一个片段之外的所有内容 [英] Parsec ignore everything except one fragment

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭