Haskell Posix中的多行匹配 [英] Multiline Matching in Haskell Posix
问题描述
特别是模块
Text.Regex.Posix
。 任何人都可以指向正确的方向在字符串上使用多行匹配?
这个好奇的代码片段:
> extractToken body = body =〜< textarea [^>] * id = \wpTextbox1 \[^>] *>(。*)< / textarea> :: String
我试图提取维基百科页面的源代码,涉及多条线时。
您可能需要 import Text.Regex.Base.RegexLike
访问 makeRegexOpts
和朋友。
extractToken body = match regex body其中
regex = makeRegexOpts(defaultCompOpt-compNewline)defaultExecOpt
< textarea [^>] * id = \wpTextbox1 \[^>] *>(。*)< ; / textarea的>中
好了,既然 Text.Regex.Posix
' defaultCompOpt = compExtended + compNewline
,这等效于
extractToken body = match regex body where
regex = makeRegexOpts compExtended defaultExecOpt
< textarea [^>] * id = \wpTextbox1 \[^>] *>(。* )< / textarea的>中
要取出第一组,请使用 RegexLike
。一种可能性是:
extractToken body = head groups其中
(preMatch,inMatch,postMatch,groups)=
match regex body ::(String,String,String,[String])
regex = makeRegexOpts compExtended defaultExecOpt
< textarea [^>] * id = \wpTextbox1 \[ ^>] * GT;<(*); / textarea的>中
I can't seem to find decent documentation on haskell's POSIX implementation.
Specifically the module Text.Regex.Posix
.
Can anyone point me in the right direction of using multiline matching on a string?
A snippet for the curious:
> extractToken body = body =~ "<textarea[^>]*id=\"wpTextbox1\"[^>]*>(.*)</textarea>" :: String
I'm trying to extract the source of wikipedia pages, however this method clearly falls over when more than one line is involved.
You may need to import Text.Regex.Base.RegexLike
for access to makeRegexOpts
and friends.
extractToken body = match regex body where
regex = makeRegexOpts (defaultCompOpt - compNewline) defaultExecOpt
"<textarea[^>]*id=\"wpTextbox1\"[^>]*>(.*)</textarea>"
Well, since Text.Regex.Posix
's defaultCompOpt = compExtended + compNewline
, that works out equivalently as
extractToken body = match regex body where
regex = makeRegexOpts compExtended defaultExecOpt
"<textarea[^>]*id=\"wpTextbox1\"[^>]*>(.*)</textarea>"
To pull out just the first group, use one of the other instances of RegexLike
. One possibility is
extractToken body = head groups where
(preMatch, inMatch, postMatch, groups) =
match regex body :: (String, String, String, [String])
regex = makeRegexOpts compExtended defaultExecOpt
"<textarea[^>]*id=\"wpTextbox1\"[^>]*>(.*)</textarea>"
这篇关于Haskell Posix中的多行匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!