通过多个查询的Python SQL循环变量 [英] Python SQL loop variables through multiple queries

查看:305
本文介绍了通过多个查询的Python SQL循环变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在处理Python Teradata(tdodbc)查询时遇到了麻烦,该查询遍历具有不同变量的同一查询并合并结果.我在另一个中收到了好的指导发布并最终到达此处.现在我的问题是,数据框仅以循环"state5"中最终变量的查询结果结束.不幸的是,我们在各自的数据库中具有相同模式的5个状态.我可以运行相同的查询,但是要循环变量,以便可以针对所有5种状态运行并返回附加查询.使用SAS Macro变量和修补很容易,但是需要将数据引入python以进行EDA和数据科学.

I'm having trouble with a Python Teradata (tdodbc) query with looping through the same query with different variables and merging the results. I received good direction in another post and ended up here. My issue now is that the dataframe only ends up with query results of the final variable in the loop, "state5". Unfortunately we have 5 states each in their own databases with the same schema. I can run the same query, but want to loop the variables so I can run for all 5 states and return an appended query. This was easy using SAS Macro variables and mending, but need to bring data to python for EDA and data science.

from teradata import tdodbc
udaExec = td.UdaExec(appConfigFile="udaexec.ini")
with udaExec.connect("${dataSourceName}") as session:


    state_dataframes = []
    STATES = ["state1", "state2", "state3", "state4", "state5"]

    for state in STATES:

    query1 = """database my_db_{};"""

    query2 = """      
        select top 10
        '{}' as state
        ,a.*
        from table_a
        """

    session.execute(query1.format(state))
    session.execute(query2.format(state))

    state_dataframes.append(pd.read_sql(query2, session))
    all_states_df = pd.concat(state_dataframes)

推荐答案

尽管这可能不是最雄辩的方法,但我终于能够使它起作用.我确实尝试将删除表作为单个变量"query5"进行处理,但收到DDL错误.一旦我将每个放置表分离到它自己的session.execute中,它就起作用了.

I was able to finally get this to work although it may not be the most eloquent way to do it. I did try to do the drop tables as a single variable "query5" but was receiving a DDL error. Once I separated each drop table into it's own session.execute, it worked.

udaExec = td.UdaExec(appConfigFile="udaexec.ini")

with udaExec.connect("${dataSourceName}") as session:

    state_dataframes = []
    STATES = ["state1", "state2", "state3", "state4", "state5"]

    for state in STATES:

            query1 = """database my_db_{};"""

            query2 = """   
            create set volatile table v_table
            ,no fallback, no before journal, no after journal as
            (  
            select top 10
            '{}' as state
            ,t.*
            from table t
            )   
            with data
            primary index (dw_key)  
            on commit preserve rows;
            """

            query3 = """
            create set volatile table v_table_2
            ,no fallback, no before journal, no after journal as
            (  
            select t.*
            from v_table t
            )   
            with data
            primary index (dw_key)  
            on commit preserve rows;

            """

            query4 = """

            select t.* 
            from v_table_2 t

            """

            session.execute(query1.format(state))
            session.execute(query2.format(state))
            session.execute(query3)
            state_dataframes.append(pd.read_sql(query4, session))
            session.execute("DROP TABLE v_table")
            session.execute("DROP TABLE v_table_2")

    all_states_df = pd.concat(state_dataframes)

为清楚起见进行仅需适当缩进即可更正问题中的查询.在我的Teradata环境中,假脱机空间有限,这需要构建许多卷表才能拆分查询.由于我花费了大量时间尝试解决此问题,因此我在答案中添加了帮助其他可能遇到这种情况的人.

Edit for clarity: correcting the query in the question only required proper indentation. In my Teradata environment I have limited spool space which requires building many vol tables to break apart queries. Since I spent a good amount of time trying to solve this, I added to the answer to help others who may run into this scenario.

这篇关于通过多个查询的Python SQL循环变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆