从数百个存储过程中解析出完整的动态 SQL 表达式 [英] Parsing out complete dynamic SQL expressions from hundreds of stored procedures

查看:52
本文介绍了从数百个存储过程中解析出完整的动态 SQL 表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我继承了一个包含数百个存储过程的大型应用程序,其中许多使用动态 SQL.为了更好地处理我正在处理的 SQL 类型,如果我有办法解析所有这些存储过程的查询文本并提取完整的表达式,那将非常有用包含在其中的任何动态 SQL.

I've inherited a large application that has many hundreds of stored procedures, many of which use dynamic SQL. In order to get a better handle of the types of SQL I am dealing with, it would be extremely useful if I had a way to parse the query text of all these stored procedures and extract the complete expression for any dynamic SQL contained within.

一个简化的表达式可能是:

A simplified expression might be:

declare @query nvarchar(max)
set @query = 'SELECT col1,col2,col3 from ' + @DatabaseName + '.dbo.' + @TableName + ' WHERE {some criteria expression that also contains inline quotes}'

我正在寻找的上述输出(最终将在解析所有存储过程的单个查询中调用)是:

The output I am seeking for the above (that would ultimately be called in a single query that parses all stored procedures) is:

SELECT col1, col2, col3 
FROM ' + @DatabaseName + '.dbo.' + @TableName + ' 
WHERE {some criteria expression that also contains inline quotes}

因此,不是传入参数值之后的表达式,而是存储过程文本中的表达式文本,包括参数名称.

So, not the expression after the parameter values have been passed in, but the expression text as in the stored procedure text, including the parameter names.

我同意动态 SQL 参数名称为 @query 的不完全安全的假设,因此在 SQL 表达式中搜索它以用作提取的起始位置文本是可以接受的,但由于有单引号内联,我没有简单的方法知道变量的赋值在哪里完成.

I'm ok with the not-at-all-safe assumption that the dynamic SQL parameter name is @query, so searching for this within the SQL expression to use as a starting position to extract text would be tolerable, but since there are single quotes inline, I have no easy way of knowing where the assignment to the variable is complete.

我在这个问题中包含了 [antlr] 和 [parsing] 标签,因为我觉得这超出了 T-SQL 的能力.

I'm including the [antlr] and [parsing] tags in this question because I have a feeling this is beyond what's capable in T-SQL.

PS:是的,我很清楚我不应该这样做".

PS: Yes, I'm well aware "I shouldn't be doing this".

编辑

根据下面的建议,尝试了以下查询,但在此上下文中并没有真正有用:

From a suggestion below, tried the following query but not really useful in this context:

SELECT 
db_name(dbid) DB_NAME
,cacheobjtype, objtype, object_name(objectid) ObjectName
,objectid 
,x.text
,usecounts 
--  , x.*,z.* ,db_name(dbid)
FROM 
sys.dm_exec_cached_plans z
CROSS APPLY sys.dm_exec_sql_text(plan_handle)  x
WHERE 
    --usecounts > 1 
    --objType = 'Proc' and  -- include if you only want to see stored procedures 
    db_name(dbid) not like 'ReportServer%' and db_name(dbid) <> 'msdb' and db_name(dbid) not like 'DBADB%' and db_name(dbid) <> 'master'
--ORDER BY usecounts DESC
ORDER BY objtype

推荐答案

粗略地讲,以下是您在 C# 中使用 ScriptDom 的方法.

To a first approximation, here's how you'd do it in C# using ScriptDom.

获取所有存储过程定义的列表很容易.这可以在 T-SQL 中完成,甚至:

Getting a list of all stored procedure definitions is easy. That can be done in T-SQL, even:

sp_msforeachdb 'select definition from [?].sys.sql_modules'

或者用通常的方式编写脚本数据库,或者使用 SMO.无论如何,我假设您可以以某种方式将这些放入 List 中,以供代码使用.

Or script databases the usual way, or use SMO. In any case, I'm assuming you can get these into a List<string> somehow, for consumption by code.

Microsoft.SqlServer.TransactSql.ScriptDom 可作为 NuGet 包使用,因此请将其添加到全新的应用程序中.我们问题的核心是编写一个访问者,该访问者将从 T-SQL 脚本中提取我们感兴趣的节点:

Microsoft.SqlServer.TransactSql.ScriptDom is available as a NuGet package, so add that to a brand new application. The core of our problem is writing a visitor that will pluck the nodes we're interested in from a T-SQL script:

class DynamicQueryFinder : TSqlFragmentVisitor {
  public List<ScalarExpression> QueryAssignments { get; } = new List<ScalarExpression>();
  public string ProcedureName { get; private set; }

  // Grab "CREATE PROCEDURE ..." nodes
  public override void Visit(CreateProcedureStatement node) {
    ProcedureName = node.ProcedureReference.Name.BaseIdentifier.Value;
  }

  // Grab "SELECT @Query = ..." nodes
  public override void Visit(SelectSetVariable node) {
    if ("@Query".Equals(node.Variable.Name, StringComparison.OrdinalIgnoreCase)) {
      QueryAssignments.Add(node.Expression);
    }
  }

  // Grab "SET @Query = ..." nodes
  public override void Visit(SetVariableStatement node) {
    if ("@Query".Equals(node.Variable.Name, StringComparison.OrdinalIgnoreCase)) {
      QueryAssignments.Add(node.Expression);
    }
  }

  // Grab "DECLARE @Query = ..." nodes
  public override void Visit(DeclareVariableElement node) {
    if (
      "@Query".Equals(node.VariableName.Value, StringComparison.OrdinalIgnoreCase) && 
      node.Value != null
    ) {
      QueryAssignments.Add(node.Value);
    }
  }
}

假设 procedures 是一个具有存储过程定义的 List,然后我们像这样应用访问者:

Let's say procedures is a List<string> that has the stored procedure definitions, then we apply the visitor like so:

foreach (string procedure in procedures) {
  TSqlFragment fragment;
  using (var reader = new StringReader(procedure)) {
    IList<ParseError> parseErrors;
    var parser = new TSql130Parser(true);  // or a lower version, I suppose
    fragment = parser.Parse(reader, out parseErrors);
    if (parseErrors.Any()) {
      // handle errors
      continue;
    }
  }
  var dynamicQueryFinder = new DynamicQueryFinder();
  fragment.Accept(dynamicQueryFinder);
  if (dynamicQueryFinder.QueryAssignments.Any()) {
    Console.WriteLine($"===== {dynamicQueryFinder.ProcedureName} =====");
    foreach (ScalarExpression assignment in dynamicQueryFinder.QueryAssignments) {
      Console.WriteLine(assignment.Script());
    }
  }
}

.Script() 是我拼凑起来的一个方便的方法,因此我们可以将片段转回纯文本:

.Script() is a little convenience method I cobbled up so we can turn fragments back into plain text:

public static class TSqlFragmentExtensions {
  public static string Script(this TSqlFragment fragment) {
    return String.Join("", fragment.ScriptTokenStream
      .Skip(fragment.FirstTokenIndex)
      .Take(fragment.LastTokenIndex - fragment.FirstTokenIndex + 1)
      .Select(t => t.Text)
    );
  }
}

这将打印分配给名为 @Query 的变量的所有存储过程中的所有表达式.

This will print all expressions in all stored procedures that are assigned to a variable named @Query.

这种方法的好处在于,您可以轻松地解析语句,因此处理更复杂,例如将字符串表达式转回其未转义形式或寻找 EXEC(...)sp_executesql(不管涉及到什么变量名),也是可能的.

The nice thing about this approach is that you will have the statements parsed at your fingertips, so more complicated processing, like turning the string expressions back into their unescaped forms or hunting for all instances of EXEC(...) and sp_executesql (regardless of variable names involved), is also possible.

当然,缺点是这不是纯 T-SQL.您可以使用任何您喜欢的 .NET 语言(我使用过 C#,因为我最熟悉它),但它仍然涉及编写外部代码.更原始的解决方案,例如CHARINDEX以您的方式处理字符串可能会起作用,如果您知道所有代码都遵循一个特定模式,该模式足够简单以供 T-SQL 字符串操作分析.

The drawback, of course, is that this isn't pure T-SQL. You can use any .NET language you like for it (I've used C# since I'm most comfortable with that), but it still involves writing external code. More primitive solutions like just CHARINDEXing your way over strings may work, if you know that all code follows a particular pattern that is simple enough for T-SQL string operations to analyze.

这篇关于从数百个存储过程中解析出完整的动态 SQL 表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆