pandas to_sql如何确定将哪个dataframe列放置在哪个数据库字段中? [英] How does Pandas to_sql determine what dataframe column is placed into what database field?

查看:271
本文介绍了 pandas to_sql如何确定将哪个dataframe列放置在哪个数据库字段中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用Pandas to_sql来将大型数据框放入SQL数据库中.我使用sqlalchemy来连接数据库,该过程的一部分是定义数据库表的列.

I'm currently using Pandas to_sql in order to place a large dataframe into an SQL database. I'm using sqlalchemy in order to connect with the database and part of that process is defining the columns of the database tables.

我的问题是,当我在数据框上运行to_sql时,它如何知道数据框的哪一列进入数据库的哪个字段?是否正在查看数据框中的列名并在数据库中查找相同的字段?变量是按顺序排列的吗?

My question is, when I'm running to_sql on a dataframe, how does it know what column from the dataframe goes into what field in the database? Is it looking at column names in the dataframe and looking for the same fields in the database? Is it the order that the variables are in?

下面是一些便于讨论的示例代码:

Here's some example code to facilitate discussion:

engine = create_engine('sqlite:///store_data.db')
meta = MetaData()

table_pop = Table('xrf_str_geo_ta4_1511', meta, 
    Column('TDLINX',Integer, nullable=True, index=True),
    Column('GEO_ID',Integer, nullable=True),
    Column('PERCINCL', Numeric, nullable=True)
)

meta.create_all(engine)

for df in pd.read_csv(file, chunksize=50000, iterator=True, encoding='utf-8', sep=',')
    df.to_sql('table_name', engine, flavor='sqlite', if_exists='append', index=index)

有问题的数据框具有3列TDLINX,GEO_ID和PERCINCL

The dataframe in question has 3 columns TDLINX, GEO_ID, and PERCINCL

推荐答案

答案的确是您所建议的:它正在查看列名.因此,匹配列名称很重要,顺序无关紧要.

The answer is indeed what you suggest: it is looking at the column names. So matching columns names is important, the order does not matter.

要完全正确,大熊猫实际上不会对此进行检查. to_sql在后台执行的操作是执行一条插入语句,将要插入的数据作为字典提供,然后由数据库驱动程序来处理.
这也意味着,熊猫将不会检查dtypes或列数(例如,如果数据库中的所有字段都不以列的形式出现在数据框中,则这些行将在数据库中填充这些行的默认值).

To be fully correct, pandas will not actually check this. What to_sql does under the hood is executing an insert statement where the data to insert is provided as a dict, and then it is just up to the database driver to handle this.
This also means that pandas will not check the dtypes or the number of columns (e.g. if not all fields of the database are present as columns in the dataframe, these will filled with a default value in the database for these rows).

这篇关于 pandas to_sql如何确定将哪个dataframe列放置在哪个数据库字段中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆