如何从大型数据库将数据加载到 pandas 中? [英] How to load data into pandas from a large database?

查看:104
本文介绍了如何从大型数据库将数据加载到 pandas 中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Postgres数据库,其中包含时间序列数据.数据库的大小约为1 GB.当前正在读取数据,这就是我要做的

I have a postgres database which contains time series data.The size of the database is around 1 GB.Currently to read data, this is what I do

import psycopg2
import pandas as pd
import pandas.io.sql as psql

conn = psycopg2.connect(database="metrics", user="*******", password="*******", host="localhost", port="5432")
cur = conn.cursor()
df = psql.read_sql("Select * from timeseries", conn)
print(df)

但这会将全部数据加载到内存中.现在我知道可以将数据库转储到csv文件中,然后可以按此处建议的方式分块读取csv文件的技术了

But this loads the entire data into the memory.Now I am aware of techniques where the database can be dumped to a csv file and then the csv file can be read in chunks as suggested here How to read a 6 GB csv file with pandas

但是对我来说这不是一个选择,因为数据库会不断变化,我需要即时读取它.是否有任何技术可以分块读取数据库内容或使用任何第三方库?

But for me that is not an option since the database will be continously changing and I need to read it on the fly.Is there any technique to read the database content maybe in chunks or use any third party libraries?

推荐答案

pd.read_sql() also has parameter chunksize, so you can read data from SQL table/query in chunks:

for df in pd.read_sql("Select * from timeseries", conn, chunksize=10**4):
    # process `df` chunk here...

这篇关于如何从大型数据库将数据加载到 pandas 中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆