解析自定义过滤器语法的最佳方法 [英] Best way to parse custom Filtersyntax

查看：62 发布时间：2021/6/14 19:37:04 c# parsing filter tokenize

本文介绍了解析自定义过滤器语法的最佳方法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个程序，它允许用户在 DataGridView 列标题的文本框中输入过滤器.然后将此文本解析为 FilterOperations 列表.

I have a program which allows the user to enter a filter in a textbox in the column header of a DataGridView. This text is then parsed into a list of FilterOperations.

目前我对字符串进行标记，然后在 Hunge For 循环中构建列表.

Currently i tokenize the string and then build the list in a hunge For-loop.

我可以使用哪些设计模式来摆脱巨大的建设?

Which Desing Patterns could i use to get rid of the huge for consruct?

我可以采取任何其他措施来改进设计吗?

Are there any other actions i can take to improve the design?

在当前状态下，很难添加对其他运算符、数据类型的支持或构建除过滤器列表之外的其他内容.假设我需要通过构建表达式(很快就会出现这种情况)或构建 SQL Where 子句来替换过滤器列表.

In the current state its hard to add support for another operator, datatype or build something else thant the filterlist. Lets say i need to replace the filterlist with building an Expression (which will be the case soon) or building an SQL Where clause.

过滤器遵循此语法并且对字符串、数字和日期时间有效:

The filter follows this Syntax and is valid for Strings, Digits and DateTimes:

范围运算符

lowerLimit .. upperLimit

29..52 将被解析为过滤器列表中的两个元素x >= 29"和x <=52"

29..52 would be parsed to two elements in the filter list "x >= 29" and "x <=52"

低于

... 上限

..52 将被解析为 "x <52"

..52 would be parsed to "x < 52"

大于

lowerLimit ..

29.. 将被解析为 "x > 29"

29.. would be parsed to "x > 29"

通配符

*someText* 在 SQL 中等于 x LIKE "%someText%"

*someText* would equal x LIKE "%someText%" in SQL

字符串文字

' 诸如 .. 或 * 之类的运算符在单引号 '

' operators like .. or * are ignored in between the single quotes '

所以我定义了三个代币

RangeOperator for ..

通配符用于*

文本用于纯值和单引号中的值

Text for pure values and the values in single quotes

public static FilterList<T> Parse<T>(string filter, string columnname, Type dataType) where T : class
        {
            if (dataType != typeof(float) && dataType != typeof(DateTime) && dataType != typeof(string))
                throw new NotSupportedException(String.Format("Data Type is not supported '{0}'", dataType));

            Token[] filterParts = tokenize(filter);
            filterParts = cleanUp(filterParts);

            StringBuilder sb = new StringBuilder();

            for (int i = 0; i < filterParts.Length; i++)
            {
                Token currentToken = filterParts[i];
                //BereichsFilter prüfen und bauen
                if (currentToken.TokenType == TokenType.RangeOperator)
                {
                    if (filterParts.Length < 2)
                    {
                        throw new FilterException("Missing argument for RangeOperator");
                    }
                    if (filterParts.Length > 3)
                    {
                        throw new FilterException("RangeOperator can't be mixed with other operators");
                    }

                    if (i == 0)
                    {
                        if (filterParts.Length == 2)
                        {
                            //Bis Operator
                            Token right = filterParts[1];
                            if (right.TokenType != TokenType.Text)
                                throw new FilterException("TextToken expected");
                            if (String.IsNullOrEmpty(right.Text))
                                throw new FilterException("Text must have value");
                            if (right.Text.StartsWith("."))
                                throw new FilterException("Text starting with a dot is not valid");

                            if (dataType == typeof(string))
                                return new FilterList<T> { { columnname, FilterOperator.Less, right.Text } };
                            //filterString = String.Format("({0} < '{1}' OR {0} IS NULL)", columnname, right.Text);
                            if (dataType == typeof(float))
                            {
                                float rightF;
                                if (!float.TryParse(right.Text, out rightF))
                                    throw new FilterException(
                                        String.Format("right parameter has wrong format '{0}'", right.Text));
                                return new FilterList<T> { { columnname, FilterOperator.Less, rightF } };
                                //filterString = String.Format("({0} < {1} OR {0} IS NULL)", columnname, rightF.ToString(CultureInfo.InvariantCulture));
                            }
                            if (dataType == typeof(DateTime))
                            {
                                DateTime rightDt = parseDateTime(right.Text);
                                return new FilterList<T> { { columnname, FilterOperator.Less, rightDt } };
                                //filterString = String.Format("({0} < '{1}' OR {0} IS NULL)", columnname, rightDT.ToString(CultureInfo.InvariantCulture));
                            }

                            break;
                        }
                        throw new FilterException("too many arguments");
                    }
                    if (i == 1)
                    {
                        if (filterParts.Length == 2)
                        {
                            //Von Operator
                            Token left = filterParts[0];
                            if (left.TokenType != TokenType.Text)
                                throw new FilterException("TextToken expected");
                            if (String.IsNullOrEmpty(left.Text))
                                throw new FilterException("Argument must have value");

                            if (dataType == typeof(string))
                                return new FilterList<T> { { columnname, FilterOperator.Greater, left.Text } };
                            //filterString = String.Format("({0} > '{1}')", columnname, left.Text);
                            if (dataType == typeof(float))
                            {
                                float leftF;
                                if (!float.TryParse(left.Text, out leftF))
                                    throw new FilterException(String.Format(
                                        "left parameter has wrong format '{0}'", left.Text));
                                return new FilterList<T> { { columnname, FilterOperator.Greater, leftF } };
                                //filterString = String.Format("({0} > {1})", columnname, leftF.ToString(CultureInfo.InvariantCulture));
                            }
                            if (dataType == typeof(DateTime))
                            {
                                DateTime leftDt = parseDateTime(left.Text);
                                return new FilterList<T> { { columnname, FilterOperator.Greater, leftDt } };
                                //filterString = String.Format("({0} > '{1}')", columnname, leftDT.ToString(CultureInfo.InvariantCulture));
                            }
                            break;
                        }
                        else
                        {
                            //BereichsOperator
                            Token left = filterParts[0];
                            if (left.TokenType != TokenType.Text)
                                throw new FilterException("TextToken expected");
                            if (String.IsNullOrEmpty(left.Text))
                                throw new FilterException("parameter must have value");

                            Token right = filterParts[2];
                            if (right.TokenType != TokenType.Text)
                                throw new FilterException("TextToken expected");
                            if (String.IsNullOrEmpty(right.Text))
                                throw new FilterException("parameter must have value");

                            if (dataType == typeof(string))
                                return new FilterList<T>
                                {
                                    {columnname, FilterOperator.GreaterOrEqual, left.Text},
                                    {columnname, FilterOperator.LessOrEqual, right.Text}
                                };
                            //filterString = String.Format("{0} >= '{1}' AND {0} <= '{2}'", columnname, left.Text, right.Text);
                            if (dataType == typeof(float))
                            {
                                float rightF;
                                if (!float.TryParse(right.Text, out rightF))
                                    throw new FilterException(
                                        String.Format("right parameter has wrong format '{0}'", right.Text));
                                float leftF;
                                if (!float.TryParse(left.Text, out leftF))
                                    throw new FilterException(String.Format(
                                        "left parameter has wrong format'{0}'", left.Text));
                                return new FilterList<T>
                                {
                                    {columnname, FilterOperator.GreaterOrEqual, leftF},
                                    {columnname, FilterOperator.LessOrEqual, rightF}
                                };
                                //filterString = String.Format("{0} >= {1} AND {0} <= {2}", columnname, leftF.ToString(CultureInfo.InvariantCulture), leftF.ToString(CultureInfo.InvariantCulture));
                            }
                            if (dataType == typeof(DateTime))
                            {
                                DateTime rightDt = parseDateTime(right.Text);
                                DateTime leftDt = parseDateTime(left.Text); 
                                return new FilterList<T>
                                {
                                    {columnname, FilterOperator.GreaterOrEqual, leftDt},
                                    {columnname, FilterOperator.LessOrEqual, rightDt}
                                };
                                //filterString = String.Format("{0} >= '{1}' AND {0} <= '{2}'", columnname, leftDT.ToString(CultureInfo.InvariantCulture), rightDT.ToString(CultureInfo.InvariantCulture));
                            }

                            break;
                        }
                    }
                    throw new FilterException("unexpected parameter");
                }
                //Stringsuche Bauen
                if (currentToken.TokenType == TokenType.Wildcard)
                {
                    if (dataType != typeof(string))
                        throw new FilterException("Operator not allowed with this Data Type");
                    //Fehler wenn Datentyp kein string
                    sb.Append("%");
                }
                else if (currentToken.TokenType == TokenType.Text)
                    sb.Append(escape(currentToken.Text));
            }

            //Filterung auf Zeichenfolge
            string text = sb.ToString();
            if (dataType == typeof(string))
                return new FilterList<T> { { columnname, FilterOperator.Like, text } };
            //filterString = String.Format("{0} LIKE '{1}' ESCAPE '\\'", columnname, text);
            if (dataType == typeof(DateTime))
            {
                DateTime dt = parseDateTime(text);
                return new FilterList<T> { { columnname, FilterOperator.Equal, dt } };
                //filterString = String.Format("{0} = '{1}'", columnname, DT.ToString(CultureInfo.InvariantCulture));
            }
            if (dataType == typeof(float))
            {
                float f;
                if (!float.TryParse(text, out f))
                    throw new FilterException(String.Format("parameter has wrong format '{0}'", text));
                return new FilterList<T> { { columnname, FilterOperator.Equal, f } };
                //filterString = String.Format("{0} = {1}", columnname, F.ToString(CultureInfo.InvariantCulture));
            }

            return null;
        }

推荐答案

您需要找到一个基于解析表达式语法.它允许您定义一个语法，然后由生成器将其转换为代码.然后，代码将能够按照您期望的语法解析文本.

You need to find a code generator for C# that is based on Parsing Expression Grammars. It lets you define a grammar that is then turned into code by the generator. The code will then be able to parse the text obeying the grammar you are expecting.

一个非常快速的 google-fu 显示 peg-sharp 可以工作.

A very quick google-fu shows that peg-sharp could work.

为了学习使用 PEG，您可以尝试PEG.js 的在线版本，它几乎与工作流程一致你最终会使用:

In order to learn using PEG you can try the online version of PEG.js which works almost along the workflow you'd be ultimately using:

输入 PEG 声明(左窗口)
javascript 解析器动态更新(右上角窗口)
解析器解析您的输入并产生结果(右下窗口)

作为概念证明，这里是您的语法的试探性实现，您可以将其复制粘贴到 PEG.js 中(我想可以设法将其嵌入到 stackoverflow 小部件中):

As a proof of concept, here is a tentative implementation of your grammar that you could copy paste in PEG.js (I guess one could manage to embed it in the stackoverflow widget):

语法如下:

start
  = filters

filters
  = left:filter " " right:filters { return {filter: left, operation: "AND", filters: right};}
  / filter

filter
  = applicableRange:range {return {type: "range", range: applicableRange};}
 / openWord:wildcard  {return {type: "wildcard", word: openWord};}
 / simpleWord:word {return simpleWord;}
 / sentence:sentence {return sentence;}

sentence
 = "'" + letters:[0-9a-zA-Z *.]* "'" {return {type: "sentence", value: letters.join("")};}

word "aword"
  = letters:[0-9a-zA-Z]+ { return {type: "word", value: letters.join("")}; }

wildcard
  = 
 "*" word:word "*" {return {type: "wildcardBoth", value: word};}
/ "*" word:word {return {type: "wildcardStart", value: word};}
/ word:word "*" {return {type: "wildcardEnd", value: word};}

range "range"
  = left:word? ".." right:word? {return {from: left, to: right};}

基本上，语法可以让您定义语言的组成部分，以及它们如何相互关联.例如，过滤器可以是一个范围、一个通配符、一个单词、一个句子或什么都没有(至少这是我在定义语法时所追求的；最后一个选项是在过滤器中结束递归).

Basically the grammar lets you define the building blocks of your language and how they are articulated in relation one to another. For example a filter can be a range, a wildcard, a word, a sentence or nothing at all (at least that's what i went for when defining the grammar; the last option is to end the recursion in filters).

与这些块一起，您可以定义遇到这些块时的输出.在这种情况下，我输出一个 JSON 对象，该对象表示应该进行哪种过滤以及过滤器将具有哪些参数.

Along with those blocks you can define what the output will be if these blocks are encountered. In this case I output a JSON object that expresses what kind of filtering should occur, and what parameters the filter will have.

如果您使用以下输入测试语法:

If you test the grammar with the following input:

'testing range' 123..456 123.. ..abc 'and testing wildcards' word1* *word2 *word3* cool heh

你会得到一个结构，它描述了应该根据语法构建的过滤器:

you will get back a structure that describes the filters that should be built according to the grammar:

{
   "filter": {
      "type": "sentence",
      "value": "testing range"
   },
   "operation": "AND",
   "filters": {
      "filter": {
         "type": "range",
         "range": {
            "from": {
               "type": "word",
               "value": "123"
            },
            "to": {
               "type": "word",
               "value": "456"
            }
         }
      },
      "operation": "AND",
      "filters": {
         "filter": {
            "type": "range",
            "range": {
               "from": {
                  "type": "word",
                  "value": "123"
               },
               "to": null
            }
         },
         "operation": "AND",
         "filters": {
            "filter": {
               "type": "range",
               "range": {
                  "from": null,
                  "to": {
                     "type": "word",
                     "value": "abc"
                  }
               }
            },
            "operation": "AND",
            "filters": {
               "filter": {
                  "type": "sentence",
                  "value": "and testing wildcards"
               },
               "operation": "AND",
               "filters": {
                  "filter": {
                     "type": "wildcard",
                     "word": {
                        "type": "wildcardEnd",
                        "value": {
                           "type": "word",
                           "value": "word1"
                        }
                     }
                  },
                  "operation": "AND",
                  "filters": {
                     "filter": {
                        "type": "wildcard",
                        "word": {
                           "type": "wildcardStart",
                           "value": {
                              "type": "word",
                              "value": "word2"
                           }
                        }
                     },
                     "operation": "AND",
                     "filters": {
                        "filter": {
                           "type": "wildcard",
                           "word": {
                              "type": "wildcardBoth",
                              "value": {
                                 "type": "word",
                                 "value": "word3"
                              }
                           }
                        },
                        "operation": "AND",
                        "filters": {
                           "filter": {
                              "type": "word",
                              "value": "cool"
                           },
                           "operation": "AND",
                           "filters": {
                              "type": "word",
                              "value": "heh"
                           }
                        }
                     }
                  }
               }
            }
         }
      }
   }
}

<小时>

C# 生成器的原理是一样的:将语法编译成一些能够解析输入的 C# 代码，并定义当解析遇到这个或那个块时会发生什么.

The principle will be the same for the C# generator: compile the grammar into some C# code capable of parsing your inputs, and define what should happen when the parsing hits this or that block.

如果发生更改，您将需要重新编译语法(尽管它可以很容易地包含在您的构建步骤中)，但您将能够生成一个表示已解析过滤器的结构，并使用它来过滤您的搜索结果.

You will need to recompile the grammar if changes occur (though it can easily be included in your build step) but you will be able to generate a structure representing the filters that have been parsed and use it to filter your search results.

PEG 的一个巨大优势是其格式众所周知，并且有大量的在线学习资源，因此知识可以转移到其他语言/用途

One huge advantage of PEG is that the format is well known and there plenty of sources for learning about it online, so the knowledge will be transferable to other languages / uses

这篇关于解析自定义过滤器语法的最佳方法的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

解析自定义过滤器语法的最佳方法 [英] Best way to parse custom Filtersyntax

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

解析自定义过滤器语法的最佳方法 [英] Best way to parse custom Filtersyntax

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭