我将参数作为python Pandas中的dict传递参数后,read_sql查询返回空数据帧 [英] read_sql query returns an empty dataframe after I pass parameters as a dict in python pandas

查看:378
本文介绍了我将参数作为python Pandas中的dict传递参数后,read_sql查询返回空数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用以下字典来参数化SQL查询的某些部分:

I am trying to parameterize some parts of a SQL Query using the below dictionary:

query_params = dict(
        {'target':'status',
         'date_from':'201712',
         'date_to':'201805',
         'drform_target':'NPA'
      })

sql_data_sample = str("""select *
                                 from table_name
                                     where dt = %(date_to)s
                                     and %(target)s in (%(drform_target)s)

                        ----------------------------------------------------
                        union all
                        ----------------------------------------------------

                        (select *,
                                 from table_name
                                     where dt  = %(date_from)s
                                     and %(target)s in ('ACT')
                                     order by random() limit 50000);""")

df_data_sample = pd.read_sql(sql_data_sample,con = cnxn,params = query_params)

但是,这将返回一个完全没有记录的数据框.我不确定是什么错误,因为没有引发任何错误.

However this returns a dataframe with no records at all. I am not sure what the error is since no error is being thrown.

df_data_sample.shape
Out[7]: (0, 1211)

最终的PostgreSql查询将是:

The final PostgreSql query would be:

select *
        from table_name
            where dt = '201805'
            and status in ('NPA')

----------------------------------------------------
union all
----------------------------------------------------
(select *
        from table_name
            where dt  = '201712'
            and status in ('ACT')
            order by random() limit 50000);-- This part of random() is only for running it on my local and not on server.

下面是用于复制的一小部分数据示例.原始数据有超过一百万条记录和1211列

Below is a small sample of data for replication. The original data has more than a million records and 1211 columns

service_change_3m   service_change_6m   dt  grp_m2          status
0                   -2                  201805  $50-$75     NPA
0                    0                  201805  < $25       NPA
0                   -1                  201805  $175-$200   ACT
0                    0                  201712  $150-$175   ACT
0                    0                  201712  $125-$150   ACT
-1                   1                  201805  $50-$75     NPA

有人可以帮我吗?

更新: 根据@shmee的建议.我终于使用了:

UPDATE: Based on suggestion by @shmee.. I am finally using :

target = 'status'
query_params = dict(
        {
         'date_from':'201712',
         'date_to':'201805',
         'drform_target':'NPA'
      })

sql_data_sample = str("""select *
                                 from table_name
                                     where dt = %(date_to)s
                                     and {0} in (%(drform_target)s)

                        ----------------------------------------------------
                        union all
                        ----------------------------------------------------

                        (select *,
                                 from table_name
                                     where dt  = %(date_from)s
                                     and {0} in ('ACT')
                                     order by random() limit 50000);""").format(target)

df_data_sample = pd.read_sql(sql_data_sample,con = cnxn,params = query_params)

推荐答案

是的,我非常有信心,您的问题是由于试图通过注释中提到的参数绑定(and %(target)s in ('ACT'))设置查询中的列名而导致的.

Yes, I am quite confident that your issue results from trying to set column names in your query via parameter binding (and %(target)s in ('ACT')) as mentioned in the comments.

这会导致您的查询将结果集限制为'status' in ('ACT')的记录(即,字符串"status"是仅包含字符串"ACT"的列表的元素吗?).当然,这是错误的,因此不会选择任何记录,并且结果为空.

This results in your query restricting the result set to records where 'status' in ('ACT') (i.e. Is the string 'status' an element of a list containing only the string 'ACT'?). This is, of course, false, hence no record gets selected and you get an empty result.

这应该可以正常工作:

import psycopg2.sql

col_name = 'status'
table_name = 'public.churn_data'
query_params = {'date_from':'201712',
                'date_to':'201805',
                'drform_target':'NPA'
               }

sql_data_sample = """select * 
                     from {0} 
                     where dt = %(date_to)s 
                     and {1} in (%(drform_target)s)
                     ----------------------------------------------------
                     union all
                     ----------------------------------------------------
                     (select * 
                      from {0} 
                      where dt  = %(date_from)s 
                      and {1} in ('ACT') 
                      order by random() limit 50000);"""

sql_data_sample = sql.SQL(sql_data_sample).format(sql.Identifier(table_name), 
                                                  sql.Identifier(col_name))

df_data_sample = pd.read_sql(sql_data_sample,con = cnxn,params = query_params)

这篇关于我将参数作为python Pandas中的dict传递参数后,read_sql查询返回空数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆