多行*非*与attoparsec匹配 [英] Multi-line *non* match with attoparsec
问题描述
I was playing around with parsing (PostgreSQL) logs which can have entries that are multi-line.
我正在玩解析(PostgreSQL)日志,可能有多行条目。 -01 01:01:01 entry1
2016-01-01 01:01:02 entry2a
entry2b
2016-01-01 01:01:03 entry3
2016-01-01 01:01:01 entry1
2016-01-01 01:01:02 entry2a
entry2b
2016-01-01 01:01:03 entry3
所以 - 用一个Perl或Python脚本,我只需抓住下一行,如果它不是以时间戳开始追加它到以前的日志条目。用 attoparsec
连接到 io-streams
来解决这个问题的明智方法是什么?我显然想用 lookAhead
做一些事情,但是没有匹配一个时间戳,但我的大脑只是缺少一些东西。
So - with a Perl or Python script I'd just grab the next line and if it wasn't starting with a timestamp append it to the previous log entry. What is a sensible way to approach this with attoparsec
hooked up to io-streams
? I clearly want to do something with lookAhead
and failing to match a timestamp but my brain is just missing something.
没有 - 仍然看不到它。我剥离了我所拥有的东西。解析单行很容易。我无法弄清楚如何解析到另一个解析模式 - 我可以看到我可以使用的lookAhead函数,但我不明白这是如何适用于应用不的情况。
Nope - still can't see it. I've stripped back what I've got. Parsing a single line is easy. I can't figure out how to parse "up to" another parsing pattern - I can see a lookAhead function I can use, but I don't see how that fits in with applying a "not" condition.
我无法看到我的匹配程度。
I can't see how I can match either. Entirely possible my brain has seized up.
{-# LANGUAGE OverloadedStrings #-}
module DummyParser (
LogStatement (..), parseLogLine
-- and, so we can test it...
, LogTimestamp , parseTimestamp
, parseSqlStmt
, newLineAndTimestamp
) where
{- we want to parse...
TIME001 statement: SELECT true;
TIME002 statement: SELECT 'b',
'c';
TIME003 statement: SELECT 3;
-}
import Data.Attoparsec.ByteString.Char8
import qualified Data.ByteString.Char8 as B
type LogTimestamp = Int
data LogStatement = LogStatement {
l_ts :: LogTimestamp
,l_sql :: String
} deriving (Eq, Show)
restOfLine :: Parser B.ByteString
restOfLine = do
rest <- takeTill (== '\n')
isEOF <- atEnd
if isEOF then
return rest
else
(char '\n') >> return rest
-- e.g. TIME001
parseTimestamp :: Parser LogTimestamp
parseTimestamp = do
string "TIME"
digits <- count 3 digit
return (read digits)
-- e.g. statement: SELECT 1
parseSqlStmt :: Parser String
parseSqlStmt = do
string "statement: "
-- How can I match until the next timestamp?
sql <- restOfLine
return (B.unpack sql)
newLineAndTimestamp :: Parser LogTimestamp
newLineAndTimestamp = (char '\n') *> parseTimestamp
spaces :: Parser ()
spaces = do
skipWhile (== ' ')
-- e.g. TIME001 statement: SELECT * FROM schema.table;
parseLogLine :: Parser LogStatement
parseLogLine = do
log_ts <- parseTimestamp
spaces
log_sql <- parseSqlStmt
let ls = LogStatement log_ts log_sql
return ls
编辑:最后我感谢arrowd的帮助
So, this was what I finally ended up with thank's to arrowd's help
isTimestampNext = lookAhead parseTimestamp *> pure()
parseLogLine :: Parser LogStatement
parseLogLine = do
log_ts <- parseTimestamp
spaces
log_sql <- parseSqlStmt
extraLines <- manyTill restOfLine (endOfInput <|> isTimestampNext)
let ls = LogStatement log_ts (log_sql ++ (B.unpack $ B.concat extraLines))
return ls
推荐答案
我在许多attoparsec问题上共享的combinator:
The combinator i shared on many attoparsec questions:
notFollowedBy p = p >> fail "not followed by"
你的解决方案就像是
Your solution would be something like
parseLogLine :: Parser LogStatement
parseLogLine = do
log_ts <- parseTimestamp
spaces
log_sql <- parseSqlStmt
newlineLeftover <- ((notFollowedBy parseTimestamp) *> parseSqlStmt) <|> pure ""
let ls = LogStatement log_ts (log_sql ++ newlineLeftover
return ls
对于 newlineLeftOver
表达式, *>
的右侧需要一些更多的工作,我想,但总体思路就是这样。
The right hand of *>
for newlineLeftOver
expression would need some more work, i guess, but overall idea is like that.
这篇关于多行*非*与attoparsec匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!