使用map_partitions和pd.df.to_sql从dask数据框创建sql表 [英] Create sql table from dask dataframe using map_partitions and pd.df.to_sql

查看：315 发布时间：2020/5/24 1:57:30 python postgresql pandas dask pandas-to-sql

本文介绍了使用map_partitions和pd.df.to_sql从dask数据框创建sql表的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

Dask没有像pandas这样的df.to_sql()，因此我正尝试复制功能并使用map_partitions方法创建sql表.这是我的代码:

Dask doesn't have a df.to_sql() like pandas and so I am trying to replicate the functionality and create an sql table using the map_partitions method to do so. Here is my code:

import dask.dataframe as dd
import pandas as pd
import sqlalchemy_utils as sqla_utils

db_url = 'my_db_url_connection'
conn = sqla.create_engine(db_url)

ddf = dd.read_csv('data/prod.csv')
meta=dict(ddf.dtypes)
ddf.map_partitions(lambda df: df.to_sql('table_name', db_url, if_exists='append',index=True), ddf, meta=meta)

这将返回我的dask数据框对象，但是当我查看我的psql服务器时，没有新表...这里出了什么问题?

This returns my dask dataframe object, but when I go look into my psql server there's no new table... what is going wrong here?

更新仍然无法使其正常工作，但是由于独立问题.后续问题:重复键值违反唯一约束-尝试从dask数据框创建sql表时发生postgres错误

UPDATE Still can't get it to work, but due to independent issue. Follow-up question: duplicate key value violates unique constraint - postgres error when trying to create sql table from dask dataframe

推荐答案

简单来说，您已经创建了一个数据框，该数据框规定了要完成的工作，但尚未执行.要执行，您需要在结果上调用.compute().

Simply, you have created a dataframe which is a prescription of the work to be done, but you have not executed it. To execute, you need to call .compute() on the result.

请注意，这里的输出并不是真正的数据帧，每个分区的计算结果为None(因为to_sql没有输出)，因此用df.to_delayed表示它可能更干净，例如

Note that the output here is not really a dataframe, each partition evaluates to None (because to_sql has no output), so it might be cleaner to express this with df.to_delayed, something like

dto_sql = dask.delayed(pd.DataFrame.to_sql)
out = [dto_sql(d, 'table_name', db_url, if_exists='append', index=True)
       for d in ddf.to_delayed()]
dask.compute(*out)

还要注意，是否获得良好的并行性将取决于数据库驱动程序和数据系统本身.

Also note, that whether you get good parallelism will depend on the database driver and the data system itself.

这篇关于使用map_partitions和pd.df.to_sql从dask数据框创建sql表的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用map_partitions和pd.df.to_sql从dask数据框创建sql表 [英] Create sql table from dask dataframe using map_partitions and pd.df.to_sql

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用map_partitions和pd.df.to_sql从dask数据框创建sql表 [英] Create sql table from dask dataframe using map_partitions and pd.df.to_sql

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭