多行非与attoparsec匹配 [英] Multi-line non* match with attoparsec*

查看：179 发布时间：2018/6/5 11:54:38 haskell attoparsec

本文介绍了多行*非*与attoparsec匹配的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

I was playing around with parsing (PostgreSQL) logs which can have entries that are multi-line.

我正在玩解析（PostgreSQL）日志，可能有多行条目。 -01 01:01:01 entry1
2016-01-01 01:01:02 entry2a
entry2b
2016-01-01 01:01:03 entry3

2016-01-01 01:01:01 entry1 2016-01-01 01:01:02 entry2a entry2b 2016-01-01 01:01:03 entry3

所以 - 用一个Perl或Python脚本，我只需抓住下一行，如果它不是以时间戳开始追加它到以前的日志条目。用 attoparsec 连接到 io-streams 来解决这个问题的明智方法是什么？我显然想用 lookAhead 做一些事情，但是没有匹配一个时间戳，但我的大脑只是缺少一些东西。

So - with a Perl or Python script I'd just grab the next line and if it wasn't starting with a timestamp append it to the previous log entry. What is a sensible way to approach this with attoparsec hooked up to io-streams? I clearly want to do something with lookAhead and failing to match a timestamp but my brain is just missing something.

没有 - 仍然看不到它。我剥离了我所拥有的东西。解析单行很容易。我无法弄清楚如何解析到另一个解析模式 - 我可以看到我可以使用的lookAhead函数，但我不明白这是如何适用于应用不的情况。

Nope - still can't see it. I've stripped back what I've got. Parsing a single line is easy. I can't figure out how to parse "up to" another parsing pattern - I can see a lookAhead function I can use, but I don't see how that fits in with applying a "not" condition.

我无法看到我的匹配程度。

I can't see how I can match either. Entirely possible my brain has seized up.

{-# LANGUAGE OverloadedStrings #-}

module DummyParser (
    LogStatement (..), parseLogLine
    -- and, so we can test it...
    , LogTimestamp , parseTimestamp
    , parseSqlStmt
    , newLineAndTimestamp
) where

{-  we want to parse...
TIME001 statement: SELECT true;
TIME002 statement: SELECT 'b',
  'c';
TIME003 statement: SELECT 3;
-}

import           Data.Attoparsec.ByteString.Char8
import qualified Data.ByteString.Char8            as B

type LogTimestamp = Int

data LogStatement = LogStatement {
     l_ts  :: LogTimestamp
    ,l_sql :: String
} deriving (Eq, Show)


restOfLine :: Parser B.ByteString
restOfLine = do
    rest <- takeTill (== '\n')
    isEOF <- atEnd
    if isEOF then
        return rest
    else
        (char '\n') >> return rest


-- e.g. TIME001
parseTimestamp :: Parser LogTimestamp
parseTimestamp  = do
  string "TIME"
  digits  <- count 3 digit
  return (read digits)


-- e.g. statement: SELECT 1
parseSqlStmt :: Parser String
parseSqlStmt = do
    string "statement: "
    -- How can I match until the next timestamp?
    sql <- restOfLine
    return (B.unpack sql)


newLineAndTimestamp :: Parser LogTimestamp
newLineAndTimestamp = (char '\n') *> parseTimestamp


spaces :: Parser ()
spaces = do
    skipWhile (== ' ')


-- e.g. TIME001 statement: SELECT * FROM schema.table;
parseLogLine :: Parser LogStatement
parseLogLine = do
    log_ts <- parseTimestamp
    spaces
    log_sql <- parseSqlStmt
    let ls = LogStatement log_ts log_sql
    return ls

编辑：最后我感谢arrowd的帮助

So, this was what I finally ended up with thank's to arrowd's help

isTimestampNext = lookAhead parseTimestamp *> pure()

parseLogLine :: Parser LogStatement
parseLogLine = do
    log_ts <- parseTimestamp
    spaces
    log_sql <- parseSqlStmt
    extraLines <- manyTill restOfLine (endOfInput <|> isTimestampNext)
    let ls = LogStatement log_ts (log_sql ++ (B.unpack $ B.concat extraLines))
    return ls

多行非与attoparsec匹配 [英] Multi-line non* match with attoparsec*

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

多行*非*与attoparsec匹配 [英] Multi-line *non* match with attoparsec

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

多行非与attoparsec匹配 [英] Multi-line non* match with attoparsec*

登录关闭