您如何以贪婪的方式使用parsec? [英] How do you use parsec in a greedy fashion?

查看:95
本文介绍了您如何以贪婪的方式使用parsec?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的工作中,我遇到了很多讨厌的sql,我有一个聪明的主意,那就是编写一个程序来解析sql并将其整齐地打印出来.我很快就完成了大部分任务,但是遇到了一个我不知道如何解决的问题.

In my work I come across a lot of gnarly sql, and I had the bright idea of writing a program to parse the sql and print it out neatly. I made most of it pretty quickly, but I ran into a problem that I don't know how to solve.

因此,让我们假设sql是从1处的bar中选择foo".我的想法是,总是有一个关键字后跟数据,因此,我所要做的就是解析一个关键字,然后在下一个关键字之前捕获所有乱码,并存储它以备以后清理(如果值得的话).这是代码:

So let's pretend the sql is "select foo from bar where 1". My thought was that there is always a keyword followed by data for it, so all I have to do is parse a keyword, and then capture all gibberish before the next keyword and store that for later cleanup, if it is worthwhile. Here's the code:

import Text.Parsec
import Text.Parsec.Combinator
import Text.Parsec.Char
import Data.Text (strip)

newtype Statement = Statement [Atom]
data Atom = Branch String [Atom] | Leaf String deriving Show

trim str = reverse $ trim' (reverse $ trim' str)
  where
    trim' (' ':xs) = trim' xs
    trim' str = str

printStatement atoms = mapM_ printAtom atoms
printAtom atom = loop 0 atom 
  where
    loop depth (Leaf str) = putStrLn $ (replicate depth ' ') ++ str
    loop depth (Branch str atoms) = do 
      putStrLn $ (replicate depth ' ') ++ str
      mapM_ (loop (depth + 2)) atoms

keywords :: [String]
keywords = [
  "select",
  "update",
  "delete",
  "from",
  "where"]

keywordparser :: Parsec String u String
keywordparser = try ((choice $ map string keywords) <?> "keywordparser")

stuffparser :: Parsec String u String
stuffparser = manyTill anyChar (eof <|> (lookAhead keywordparser >> return ()))

statementparser = do
  key <- keywordparser
  stuff <- stuffparser
  return $ Branch key [Leaf (trim stuff)]
  <?> "statementparser"

tp = parse (many statementparser) ""

这里的关键是填充解析器.这就是关键字之间的内容,从列列表到条件都可以是任何东西.该功能捕获所有导致关键词的字符.但是它还需要其他一些东西才能完成.如果有子选择怎么办? 从栏中选择ID,(从产品中选择产品)".好吧,在这种情况下,如果它碰到了那个关键字,它将把所有东西搞砸,解析错误并弄糟我的缩进.还有where子句也可以带有括号.

The key here is the stuffparser. That is the stuff in between the keywords that could be anything from column lists to where criteria. This function catches all characters leading up to a keyword. But it needs something else before it is finished. What if there is a subselect? "select id,(select product from products) from bar". Well in that case if it hits that keyword, it screws everything up, parses it wrong and screws up my indenting. Also where clauses can have parenthesis as well.

因此,我需要将anyChar更改为另一个组合器,该组合器一次吸收一个字符,但还尝试查找括号,如果找到它们,遍历并捕获所有这些,但是如果还有更多括号,请执行直到我们完全关闭括号,然后将其全部串联并返回.这是我尝试过的方法,但无法完全发挥作用.

So I need to change that anyChar into another combinator that slurps up characters one at a time but also tries to look for parenthesis, and if it finds them, traverse and capture all that, but also if there are more parenthesis, do that until we have fully closed the parenthesis, then concatenate it all and return it. Here's what I've tried, but I can't quite get it to work.

stuffparser :: Parsec String u String
stuffparser = fmap concat $ manyTill somechars (eof <|> (lookAhead keywordparser >> return ()))
  where
    somechars = parens <|> fmap (\c -> [c]) anyChar
    parens= between (char '(') (char ')') somechars

这样会出错:

> tp "select asdf(qwerty) from foo where 1"
Left (line 1, column 14):
unexpected "w"
expecting ")"

但是我想不出什么办法来重写它以便它起作用.我曾尝试在括号部分使用manyTill,但是当我同时使用字符串产生括号和单个字符作为替代时,我很难将其进行类型检查.有人对此有任何建议吗?

But I can't think of any way to rewrite this so that it works. I've tried to use manyTill on the parenthesis part, but I end up having trouble getting it to typecheck when I have both string producing parens and single chars as alternatives. Does anyone have any suggestions on how to go about this?

推荐答案

是的,between可能无法满足您的需求.当然,对于您的用例,我会遵循hammar的建议并使用一个现成的SQL解析器. (个人观点:或者,除非确实需要,否则尽量不要使用SQL;将字符串用于数据库查询的想法是历史上的错误).

Yeah, between might not work for what you're looking for. Of course, for your use case, I'd follow hammar's suggestion and grab an off-the-shelf SQL parser. (personal opinion: or, try not to use SQL unless you really have to; the idea to use strings for database queries was imho a historical mistake).

注意:我添加了一个名为<++>的运算符,该运算符将连接两个解析器的结果,无论它们是字符串还是字符. (底部代码.)

Note: I add an operator called <++> which will concatenate the results of two parsers, whether they are strings or characters. (code at bottom.)

首先,对于解析括号的任务:顶层将解析相关字符之间的某些内容,这正是代码所说的内容,

First, for the task of parsing parenthesis: the top level will parse some stuff between the relevant characters, which is exactly what the code says,

parseParen = char '(' <++> inner <++> char ')'

然后,inner函数应该解析其他内容:非括号,可能包括另一组括号,以及后面的非括号垃圾.

Then, the inner function should parse anything else: non-parens, possibly including another set of parenthesis, and non-paren junk that follows.

parseParen = char '(' <++> inner <++> char ')' where
    inner = many (noneOf "()") <++> option "" (parseParen <++> inner)

我将假设在该解决方案的其余部分中,您想要做的就是通过顶级SQL关键字将内容分解. (即忽略括号中的内容).即,我们将有一个解析器,其行为将像这样,

I'll make the assumption that for the rest of the solution, what you want to do is analgous to splitting things up by top-level SQL keywords. (i.e. ignoring those in parenthesis). Namely, we'll have a parser that will behave like so,

Main> parseTest parseSqlToplevel "select asdf(select m( 2) fr(o)m w where n) from b where delete 4"
[(Select," asdf(select m( 2) fr(o)m w where n) "),(From," b "),(Where," "),(Delete," 4")]

假设我们有一个parseKw解析器,它将获得与select相似的类,等等.使用了关键字之后,我们需要阅读直到下一个[top-level]关键字.解决方案的最后一个技巧是使用lookAhead组合器来确定下一个单词是否是关键字,如果是,则将其放回去.如果不是,那么我们使用括号或其他字符,然后对其余字符进行递归.

Suppose we have a parseKw parser that will get the likes of select, etc. After we consume a keyword, we need to read until the next [top-level] keyword. The last trick to my solution is using the lookAhead combinator to determine whether the next word is a keyword, and put it back if so. If it's not, then we consume a parenthesis or other character, and then recurse on the rest.

-- consume spaces, then eat a word or parenthesis
parseOther = many space <++>
    (("" <$ lookAhead (try parseKw)) <|> -- if there's a keyword, put it back!
     option "" ((parseParen <|> many1 (noneOf "() \t")) <++> parseOther))

我的整个解决方案如下

-- overloaded operator to concatenate string results from parsers
class CharOrStr a where toStr :: a -> String
instance CharOrStr Char where toStr x = [x]
instance CharOrStr String where toStr = id
infixl 4 <++>
f <++> g = (\x y -> toStr x ++ toStr y) <$> f <*> g

data Keyword = Select | Update | Delete | From | Where deriving (Eq, Show)

parseKw =
    (Select <$ string "select") <|>
    (Update <$ string "update") <|>
    (Delete <$ string "delete") <|>
    (From <$ string "from") <|>
    (Where <$ string "where") <?>
    "keyword (select, update, delete, from, where)"

-- consume spaces, then eat a word or parenthesis
parseOther = many space <++>
    (("" <$ lookAhead (try parseKw)) <|> -- if there's a keyword, put it back!
     option "" ((parseParen <|> many1 (noneOf "() \t")) <++> parseOther))

parseSqlToplevel = many ((,) <$> parseKw <*> (space <++> parseOther)) <* eof

parseParen = char '(' <++> inner <++> char ')' where
    inner = many (noneOf "()") <++> option "" (parseParen <++> inner)

编辑-具有报价支持的版本

您可以执行与括号相同的操作来支持引号,

edit - version with quote support

you can do the same thing as with the parens to support quotes,

import Control.Applicative hiding (many, (<|>))
import Text.Parsec
import Text.Parsec.Combinator

-- overloaded operator to concatenate string results from parsers
class CharOrStr a where toStr :: a -> String
instance CharOrStr Char where toStr x = [x]
instance CharOrStr String where toStr = id
infixl 4 <++>
f <++> g = (\x y -> toStr x ++ toStr y) <$> f <*> g

data Keyword = Select | Update | Delete | From | Where deriving (Eq, Show)

parseKw =
    (Select <$ string "select") <|>
    (Update <$ string "update") <|>
    (Delete <$ string "delete") <|>
    (From <$ string "from") <|>
    (Where <$ string "where") <?>
    "keyword (select, update, delete, from, where)"

-- consume spaces, then eat a word or parenthesis
parseOther = many space <++>
    (("" <$ lookAhead (try parseKw)) <|> -- if there's a keyword, put it back!
     option "" ((parseParen <|> parseQuote <|> many1 (noneOf "'() \t")) <++> parseOther))

parseSqlToplevel = many ((,) <$> parseKw <*> (space <++> parseOther)) <* eof

parseQuote = char '\'' <++> inner <++> char '\'' where
    inner = many (noneOf "'\\") <++>
        option "" (char '\\' <++> anyChar <++> inner)

parseParen = char '(' <++> inner <++> char ')' where
    inner = many (noneOf "'()") <++>
        (parseQuote <++> inner <|> option "" (parseParen <++> inner))

我用parseTest parseSqlToplevel "select ('a(sdf'())b"尝试过.欢呼

这篇关于您如何以贪婪的方式使用parsec?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆