SQLAlchemy 中的枢轴 [英] Pivot in SQLAlchemy

查看:54
本文介绍了SQLAlchemy 中的枢轴的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

想象一个由日期和名称主键的表.

Imagine a table primary keyed by date and name.

|Date|Name|Value|
|D1  |   A|    2|
|D1  |   B|    3|
|D2  |   A|    3|
|D2  |   C|    1|

如何编写 SQLAlchemy 查询生成 (date, value) 对,其中值是该日期所有名称的所有值?值元组的长度将是 len(distinct(Name)) 并且缺失值将包含一些标记值(例如 None).

How would one write a SQLAlchemy query yielding (date, value) pairs where the values are all of the values for all of the names on that date? The length of the value tuple would be len(distinct(Name)) and missing values would contain some sentinel value (e.g. None).

i.e.
[
(D1, 2, 3, None),
(D2, 3, None, 1),
...
]

我不是在这里要求一个完整的答案来让我批准.如果你能告诉我在文档中搜索/阅读哪些概念(或者真的,指出我任何有用或相关的东西),我会(尝试)自己弄清楚并发布更新.

I'm not asking for a full answer here in order to get me to approve. If you can tell me what concepts to search for/read in the documentation (or really, point me to anything useful or related), I'll (try to) figure it out myself and post an update.

[请不要使用熊猫解决方案]

[No pandas solutions please]

简单的答案是以长格式加载查询并使用 numpy.searchsorted 在 numpy 中进行旋转,但如果提交的答案使用了这个,我会对一些性能诊断感兴趣,以便与简单的解决方案进行比较.

The easy answer is to load the query in long form and do the pivoting in numpy using numpy.searchsorted, but if the submitted answer uses this, I would be interested in some performance diagnostics to compare with the simple solution.

推荐答案

我建议不要在 SQL Server 端这样做,除非你有大量的数据(甚至在在这种情况下,我不确定它是否具有显着的积极性能影响).

I would recommend not to do it in on the SQL Server side, unless you have a huge amount of data (and even in this case I am not sure it has significant positive performance impact).

相反,在它们擅长的地方使用不同的工具:

Instead, use different tools where they excel:

  • MS-SQL(或其他 RDBMS)来存储数据
  • SQLAlchemy 插入/查询该数据
  • numpy 例程或 pandas.pivot 数据透视
  • MS-SQL (or other RDBMS) to store data
  • SQLAlchemy to insert/query that data
  • numpy routines or pandas.pivot to pivot the data

下面的代码是自包含的,运行(使用 sqlite)并展示了如何获得您想要的最终结果:

The code below is self-contained, running (with sqlite) and shows how to do get to the final result you desire:

from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
import pandas as pd

_db_uri = 'sqlite:///:memory:'
engine = create_engine(_db_uri, echo=True)
Session = sessionmaker(bind=engine)

Base = declarative_base(engine)

class MyTable(Base):
    __tablename__ = 'my_table'
    id = Column(Integer, primary_key=True)
    date = Column(String)
    name = Column(String)
    value = Column(Integer)

def _test():
    Base.metadata.drop_all(engine)
    Base.metadata.create_all(engine)
    session = Session()

    def _add_test_data():
        rows = [
            MyTable(date="D1", name="A", value=2),
            MyTable(date="D1", name="B", value=3),
            MyTable(date="D2", name="A", value=3),
            MyTable(date="D2", name="C", value=1),
        ]
        session.add_all(rows)
        session.commit()

    # create test data
    _add_test_data()

    # use `sa` to query data from the database
    # q = session.query(MyTable)  # all columns
    q = session.query(MyTable.date, MyTable.name, MyTable.value)  # explicit

    # read data into pandas directly from the query `q`
    df = pd.read_sql(q.statement, q.session.bind)
    print(df)

    # pivot the results
    df_pivot = df.pivot(index="date", columns="name", values="value")
    print(df_pivot)


if __name__ == '__main__':
    _test()

这篇关于SQLAlchemy 中的枢轴的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆