如何遍历Pandas数据透视表? (多索引数据框?) [英] How can I iterate over Pandas pivot table? (A multi-index dataframe?)

查看:526
本文介绍了如何遍历Pandas数据透视表? (多索引数据框?)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个要迭代的数据透视表,存储在数据库中.

I have a pivot table I want to iterate over, to store in a database.

                                           age  weekly_income
category_weekly_income category_age
High income            Middle aged   45.527721   15015.463667
                       Old           70.041456   14998.104486
                       Young         14.995210   15003.750822
Low income             Middle aged   45.548155    1497.228548
                       Old           70.049987    1505.655319
                       Young         15.013538    1501.718198
Middle income          Middle aged   45.516583    6514.830294
                       Old           69.977657    6494.626962
                       Young         15.020688    6487.661554

我玩过重塑,融化,各种for循环,黑暗中的语法刺,堆栈链,堆栈,reset_indexes等.我得到的最接近的语法是:

I've played with reshape, melt, various for loops, syntax stabs in the dark, chains of stacks, unstacks, reset_indexes, etc.. The closest I have got is the syntax:

crosstab[1:2].age

这样我就可以提取单个值单元格,但是然后就无法获取索引的值.

With this I can pull individual value cells, however I then can't get the value of the indexes.

推荐答案

您不需要迭代数据框,Pandas已经提供了一种通过

You don't need to iterate the dataframe, Pandas has already provided a method to convert dataframe to sql by DataFrame.to_sql(...).

或者,如果要手动将数据插入数据库,则可以使用Pandas的

Alternatively, if you want to manually insert data into database, you can use Pandas' to_csv(), for example:

我有这样的df:

df
                     A         B
first second                    
bar   one     0.826425 -1.126757
      two     0.682297  0.875014
baz   one    -1.714757 -0.436622
      two    -0.366858  0.341702
foo   one    -1.068390 -1.074582
      two     0.863934  0.043367
qux   one    -0.510881  0.215230
      two     0.760373  0.274389


# set header=False, and index=True to get the MultiIndex from pivot    
print df.to_csv(header=False, index=True)

bar,one,0.8264252111679552,-1.1267570930327846
bar,two,0.6822970851678805,0.8750144682657339
baz,one,-1.7147570530422946,-0.43662238320911956
baz,two,-0.3668584476904599,0.341701643567155
foo,one,-1.068390451744478,-1.0745823278191735
foo,two,0.8639343368644695,0.043366628502542914
qux,one,-0.5108806384876237,0.21522973766619563
qux,two,0.7603733646419842,0.2743886250125428

这将为您提供一个很好的逗号分隔格式,该格式将很容易在sql执行查询中使用,例如:

This will provide you a nice comma-delimited format which will be easily be used in sql execute query, something like:

data = []
for line in df.to_csv(header=False, index=True).split('\n'):
    if line:
        data.append(tuple(line.split(',')))

data

[('bar', 'one', '0.8264252111679552', '-1.1267570930327846'),
 ('bar', 'two', '0.6822970851678805', '0.8750144682657339'),
 ('baz', 'one', '-1.7147570530422946', '-0.43662238320911956'),
 ('baz', 'two', '-0.3668584476904599', '0.341701643567155'),
 ('foo', 'one', '-1.068390451744478', '-1.0745823278191735'),
 ('foo', 'two', '0.8639343368644695', '0.043366628502542914'),
 ('qux', 'one', '-0.5108806384876237', '0.21522973766619563'),
 ('qux', 'two', '0.7603733646419842', '0.2743886250125428')]

那只是做一个executemany的问题:

...
stmt = "INSERT INTO table (first, second, A, B) VALUES (%s, %s, %s, %s)"
cursor.executemany(stmt, data)
...

希望这会有所帮助.

这篇关于如何遍历Pandas数据透视表? (多索引数据框?)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆