多行*非*与attoparsec匹配 [英] Multi-line *non* match with attoparsec

查看:179
本文介绍了多行*非*与attoparsec匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

I was playing around with parsing (PostgreSQL) logs which can have entries that are multi-line.

我正在玩解析(PostgreSQL)日志,可能有多行条目。 -01 01:01:01 entry1
2016-01-01 01:01:02 entry2a
entry2b
2016-01-01 01:01:03 entry3

2016-01-01 01:01:01 entry1 2016-01-01 01:01:02 entry2a entry2b 2016-01-01 01:01:03 entry3

所以 - 用一个Perl或Python脚本,我只需抓住下一行,如果它不是以时间戳开始追加它到以前的日志条目。用 attoparsec 连接到 io-streams 来解决这个问题的明智方法是什么?我显然想用 lookAhead 做一些事情,但是没有匹配一个时间戳,但我的大脑只是缺少一些东西。

So - with a Perl or Python script I'd just grab the next line and if it wasn't starting with a timestamp append it to the previous log entry. What is a sensible way to approach this with attoparsec hooked up to io-streams? I clearly want to do something with lookAhead and failing to match a timestamp but my brain is just missing something.

没有 - 仍然看不到它。我剥离了我所拥有的东西。解析单行很容易。我无法弄清楚如何解析到另一个解析模式 - 我可以看到我可以使用的lookAhead函数,但我不明白这是如何适用于应用不的情况。

Nope - still can't see it. I've stripped back what I've got. Parsing a single line is easy. I can't figure out how to parse "up to" another parsing pattern - I can see a lookAhead function I can use, but I don't see how that fits in with applying a "not" condition.

我无法看到我的匹配程度。

I can't see how I can match either. Entirely possible my brain has seized up.

{-# LANGUAGE OverloadedStrings #-}

module DummyParser (
    LogStatement (..), parseLogLine
    -- and, so we can test it...
    , LogTimestamp , parseTimestamp
    , parseSqlStmt
    , newLineAndTimestamp
) where

{-  we want to parse...
TIME001 statement: SELECT true;
TIME002 statement: SELECT 'b',
  'c';
TIME003 statement: SELECT 3;
-}

import           Data.Attoparsec.ByteString.Char8
import qualified Data.ByteString.Char8            as B

type LogTimestamp = Int

data LogStatement = LogStatement {
     l_ts  :: LogTimestamp
    ,l_sql :: String
} deriving (Eq, Show)


restOfLine :: Parser B.ByteString
restOfLine = do
    rest <- takeTill (== '\n')
    isEOF <- atEnd
    if isEOF then
        return rest
    else
        (char '\n') >> return rest


-- e.g. TIME001
parseTimestamp :: Parser LogTimestamp
parseTimestamp  = do
  string "TIME"
  digits  <- count 3 digit
  return (read digits)


-- e.g. statement: SELECT 1
parseSqlStmt :: Parser String
parseSqlStmt = do
    string "statement: "
    -- How can I match until the next timestamp?
    sql <- restOfLine
    return (B.unpack sql)


newLineAndTimestamp :: Parser LogTimestamp
newLineAndTimestamp = (char '\n') *> parseTimestamp


spaces :: Parser ()
spaces = do
    skipWhile (== ' ')


-- e.g. TIME001 statement: SELECT * FROM schema.table;
parseLogLine :: Parser LogStatement
parseLogLine = do
    log_ts <- parseTimestamp
    spaces
    log_sql <- parseSqlStmt
    let ls = LogStatement log_ts log_sql
    return ls






编辑:最后我感谢arrowd的帮助


So, this was what I finally ended up with thank's to arrowd's help

isTimestampNext = lookAhead parseTimestamp *> pure()

parseLogLine :: Parser LogStatement
parseLogLine = do
    log_ts <- parseTimestamp
    spaces
    log_sql <- parseSqlStmt
    extraLines <- manyTill restOfLine (endOfInput <|> isTimestampNext)
    let ls = LogStatement log_ts (log_sql ++ (B.unpack $ B.concat extraLines))
    return ls


推荐答案

我在许多attoparsec问题上共享的combinator:

The combinator i shared on many attoparsec questions:

notFollowedBy p = p >> fail "not followed by"

你的解决方案就像是

Your solution would be something like

parseLogLine :: Parser LogStatement
parseLogLine = do
    log_ts <- parseTimestamp
    spaces
    log_sql <- parseSqlStmt
    newlineLeftover <- ((notFollowedBy parseTimestamp) *> parseSqlStmt) <|> pure ""
    let ls = LogStatement log_ts (log_sql ++ newlineLeftover
    return ls

对于 newlineLeftOver 表达式, *> 的右侧需要一些更多的工作,我想,但总体思路就是这样。

The right hand of *> for newlineLeftOver expression would need some more work, i guess, but overall idea is like that.

这篇关于多行*非*与attoparsec匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆