SQLAlchemy 中的枢轴 [英] Pivot in SQLAlchemy
问题描述
想象一个由日期和名称主键的表.
Imagine a table primary keyed by date and name.
|Date|Name|Value|
|D1 | A| 2|
|D1 | B| 3|
|D2 | A| 3|
|D2 | C| 1|
如何编写 SQLAlchemy 查询生成 (date, value) 对,其中值是该日期所有名称的所有值?值元组的长度将是 len(distinct(Name))
并且缺失值将包含一些标记值(例如 None).
How would one write a SQLAlchemy query yielding (date, value) pairs where the values are all of the values for all of the names on that date? The length of the value tuple would be len(distinct(Name))
and missing values would contain some sentinel value (e.g. None).
i.e.
[
(D1, 2, 3, None),
(D2, 3, None, 1),
...
]
我不是在这里要求一个完整的答案来让我批准.如果你能告诉我在文档中搜索/阅读哪些概念(或者真的,指出我任何有用或相关的东西),我会(尝试)自己弄清楚并发布更新.
I'm not asking for a full answer here in order to get me to approve. If you can tell me what concepts to search for/read in the documentation (or really, point me to anything useful or related), I'll (try to) figure it out myself and post an update.
[请不要使用熊猫解决方案]
[No pandas solutions please]
简单的答案是以长格式加载查询并使用 numpy.searchsorted 在 numpy 中进行旋转,但如果提交的答案使用了这个,我会对一些性能诊断感兴趣,以便与简单的解决方案进行比较.>
The easy answer is to load the query in long form and do the pivoting in numpy using numpy.searchsorted, but if the submitted answer uses this, I would be interested in some performance diagnostics to compare with the simple solution.
推荐答案
我建议不要在 SQL Server
端这样做,除非你有大量的数据(甚至在在这种情况下,我不确定它是否具有显着的积极性能影响).
I would recommend not to do it in on the SQL Server
side, unless you have a huge amount of data (and even in this case I am not sure it has significant positive performance impact).
相反,在它们擅长的地方使用不同的工具:
Instead, use different tools where they excel:
MS-SQL
(或其他 RDBMS)来存储数据SQLAlchemy
插入/查询该数据numpy
例程或pandas.pivot
数据透视
MS-SQL
(or other RDBMS) to store dataSQLAlchemy
to insert/query that datanumpy
routines orpandas.pivot
to pivot the data
下面的代码是自包含的,运行(使用 sqlite
)并展示了如何获得您想要的最终结果:
The code below is self-contained, running (with sqlite
) and shows how to do get to the final result you desire:
from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
import pandas as pd
_db_uri = 'sqlite:///:memory:'
engine = create_engine(_db_uri, echo=True)
Session = sessionmaker(bind=engine)
Base = declarative_base(engine)
class MyTable(Base):
__tablename__ = 'my_table'
id = Column(Integer, primary_key=True)
date = Column(String)
name = Column(String)
value = Column(Integer)
def _test():
Base.metadata.drop_all(engine)
Base.metadata.create_all(engine)
session = Session()
def _add_test_data():
rows = [
MyTable(date="D1", name="A", value=2),
MyTable(date="D1", name="B", value=3),
MyTable(date="D2", name="A", value=3),
MyTable(date="D2", name="C", value=1),
]
session.add_all(rows)
session.commit()
# create test data
_add_test_data()
# use `sa` to query data from the database
# q = session.query(MyTable) # all columns
q = session.query(MyTable.date, MyTable.name, MyTable.value) # explicit
# read data into pandas directly from the query `q`
df = pd.read_sql(q.statement, q.session.bind)
print(df)
# pivot the results
df_pivot = df.pivot(index="date", columns="name", values="value")
print(df_pivot)
if __name__ == '__main__':
_test()
这篇关于SQLAlchemy 中的枢轴的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!