Parsec忽略除一个片段之外的所有内容 [英] Parsec ignore everything except one fragment
问题描述
我需要在一个格式不完整的HTML文档中解析单个选择标记(因此基于XML的解析器不起作用)。
我知道如何使用parsec解析select标签,但是如何在标签之前和之后跳过所有内容?
示例:
< html>
带有大量标签的随机内容...
< select id = something title =whatever>< option value = 1 selected> 1。第一个<选项值= 2> 2。第二及LT; /选择>
更随机的内容...
< / html>
这实际上是HTML在选择标记中的样子。我该怎么做Parsec,或者你会推荐我使用不同的库?
解决方案以下是我该怎么做:
solution =(do {
; string< tag-name
; x< ; - ⟦insertOptionsParserHere⟧
; char'>'
; return x
})< |> (anyChar>>解决方案)
这将递归地消耗字符,直到遇到< html>< / code>标签,它使用你的解析器,并在使用最后一个标签时留下递归。
明智地注意到,在&为了解决这个问题,我们可以这样做,只要你的解析器使用标签:
solution =⟦insertHtmlParserHere⟧< |> ; (anyChar>>解决方案)
明确表示 ⟦insertHtmlParserHere⟧
会有这样的结构:
⟦insertHtmlParserHere⟧= do
字符串< tag-name
⋯
char'>'
作为一个侧面提示,如果你想捕捉所有可用的标签,你可以非常高兴地使用 many
:
everyTag =许多解决方案
I need to parse a single select tag in a poorly formed HTML document (so XML-based parsers don't work).
I think I know how to use parsec to parse the select tag once I get there, but how do I skip all the stuff before and after that tag?
Example:
<html>
random content with lots of tags...
<select id=something title="whatever"><option value=1 selected>1. First<option value=2>2. Second</select>
more random content...
</html>
That's actually what the HTML looks like in the select tag. How would I do this with Parsec, or would you recommend I use a different library?
Here's how I'd do it:
solution = (do {
; string "<tag-name"
; x <- ⟦insertOptionsParserHere⟧
; char '>'
; return x
}) <|> (anyChar >> solution)
This will recursively consume characters until it meets a starting <html>
tag, upon which it uses your parser, and leaves the recursion on consuming a final tag.
It is wise to note that there may be trailing whitespace before & after To fix that, we could do this, providing your parser consumes the tags:
solution = ⟦insertHtmlParserHere⟧ <|> (anyChar >> solution)
To be clear that would mean that ⟦insertHtmlParserHere⟧
would have this kind of structure:
⟦insertHtmlParserHere⟧ = do
string "<tag-name"
⋯
char '>'
As a side-note, if you want to capture every tag available, you can quite happily use many
:
everyTag = many solution
这篇关于Parsec忽略除一个片段之外的所有内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!