使用 python 在多个字符串中查找和计算子字符串的实例 [英] Find and count instance of substring within multiple strings using python

查看:30
本文介绍了使用 python 在多个字符串中查找和计算子字符串的实例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 python 中有一个带有多个子查询的 sql 查询.所以设置是一个更大的字符串中的多个子字符串.我想检查子字符串中字符串的实例数.比我看到的更多参与并感谢您的帮助.

I have a sql query within python with multiple sub-queries. So the set up is multiple substrings within a larger string. I would like to check for the number of instances of a string within the substrings. A little more involved than what I have seen posted and appreciate the help.

这样设置-

qry = ''' 
with 
qry_1 as ( 
   SELECT ID, 
          NAME
   FROM   ( ... other code...
),
qry_2 as ( 
    SELECT coalesce (table1.ID, table2.ID) as ID,
           NAME
   FROM (...other code...
),
qry_3 as (
     SELECT id.WEATHER AS WEATHER_MORN,
            ROW_NUMBER() OVER(PARTITION BY id.SUN
                ORDER BY id.TIME) AS SUN_TIME,
            id.RAIN,
            id.MIST
   FROM (...other code..
)
'''

我想计算 qry_1、qry_2、qry_3ID 的实例.

And I'd like to count the instances of ID for within qry_1, qry_2, qry_3.

我认为可以利用 re.findall 然后进行子字符串搜索?

Something I think would leverage re.findall and then a substring search?

re.findall(r'as \( select (.+?) from \(',qry)

然后在其中查找和计算 ID 的实例?输出为 2.但我不确定如何...

And then finding and counting instances of ID within this? Where the output is 2. But I'm not sure how...

推荐答案

您可以拆分 CTE 查询,然后在子查询的截断版本上使用 re.findall:

You can split the CTE queries, and then use re.findall on a truncated version of the subquery:

qry = ''' 
with 
qry_1 as ( 
  SELECT ID, 
      NAME
  FROM   ( ... other code...
),
qry_2 as ( 
  SELECT coalesce (table1.ID, table2.ID) as ID,
       NAME
FROM (...other code...
),
qry_3 as (
  SELECT WEATHER
FROM (...other code..
)
'''

def get_cols(s):
   [cte_name] = re.findall('^\w+(?=\sas)|(?<=with\s)\w+(?=\sas)', s)
   cols = re.findall('(?<=as\s)[\w\.]+|(?<=SELECT\s)[\w\.]+|(?<=,\s)[\w\.]+', s)
   return [cte_name, cols]

#dictionary with the cte name as the key, and the columns as the values
v = dict(get_cols(re.sub('coalesce\s\(.+\)|[\s\n]+', ' ', i)) for i in re.split('(?<=\)),(?:\s+)*\n', qry))
#filter the dictionary above to only include desired column names
r = {a:k if (k:=[i for i in b if i in {'NAME', 'ID'}]) else None for a, b in v.items()}

输出:

{'qry_1': ['ID', 'NAME'], 'qry_2': ['ID', 'NAME'], 'qry_3': None}

这篇关于使用 python 在多个字符串中查找和计算子字符串的实例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆