Haskell Posix中的多行匹配 [英] Multiline Matching in Haskell Posix

查看:137
本文介绍了Haskell Posix中的多行匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我似乎无法找到关于haskell的POSIX实现的体面的文档。
特别是模块 Text.Regex.Posix



任何人都可以指向正确的方向在字符串上使用多行匹配?



这个好奇的代码片段:

 > extractToken body = body =〜< textarea [^>] * id = \wpTextbox1 \[^>] *>(。*)< / textarea> :: String 

我试图提取维基百科页面的源代码,涉及多条线时。

解决方案

您可能需要 import Text.Regex.Base.RegexLike 访问 makeRegexOpts 和朋友。

  extractToken body = match regex body其中
regex = makeRegexOpts(defaultCompOpt-compNewline)defaultExecOpt
< textarea [^>] * id = \wpTextbox1 \[^>] *>(。*)< ; / textarea的>中

好了,既然 Text.Regex.Posix ' defaultCompOpt = compExtended + compNewline ,这等效于

  extractToken body = match regex body where 
regex = makeRegexOpts compExtended defaultExecOpt
< textarea [^>] * id = \wpTextbox1 \[^>] *>(。* )< / textarea的>中

要取出第一组,请使用 RegexLike 。一种可能性是:

  extractToken body = head groups其中
(preMatch,inMatch,postMatch,groups)=
match regex body ::(String,String,String,[String])
regex = makeRegexOpts compExtended defaultExecOpt
< textarea [^>] * id = \wpTextbox1 \[ ^>] * GT;<(*); / textarea的>中


I can't seem to find decent documentation on haskell's POSIX implementation. Specifically the module Text.Regex.Posix.

Can anyone point me in the right direction of using multiline matching on a string?

A snippet for the curious:

> extractToken body = body =~ "<textarea[^>]*id=\"wpTextbox1\"[^>]*>(.*)</textarea>" :: String

I'm trying to extract the source of wikipedia pages, however this method clearly falls over when more than one line is involved.

解决方案

You may need to import Text.Regex.Base.RegexLike for access to makeRegexOpts and friends.

extractToken body = match regex body where
    regex = makeRegexOpts (defaultCompOpt - compNewline) defaultExecOpt
              "<textarea[^>]*id=\"wpTextbox1\"[^>]*>(.*)</textarea>"

Well, since Text.Regex.Posix's defaultCompOpt = compExtended + compNewline, that works out equivalently as

extractToken body = match regex body where
    regex = makeRegexOpts compExtended defaultExecOpt
              "<textarea[^>]*id=\"wpTextbox1\"[^>]*>(.*)</textarea>"

To pull out just the first group, use one of the other instances of RegexLike. One possibility is

extractToken body = head groups where
    (preMatch, inMatch, postMatch, groups) =
        match regex body :: (String, String, String, [String])
    regex = makeRegexOpts compExtended defaultExecOpt
              "<textarea[^>]*id=\"wpTextbox1\"[^>]*>(.*)</textarea>"

这篇关于Haskell Posix中的多行匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆