将数据帧分成多个新的数据帧并批量重组新的 dfs [英] Separate dataframe into multiple new dataframes and BULK retructure the new dfs
问题描述
我有一大组数据,其中包含 100 多列数据,结构如下:
I have a big set of data with 100+ columns of data structured like:
country_a country_b year variable1 variable2 ...... varaible100
country_a country_b year variable1 variable2 ...... varaible100
我们的目标是将 100 个变量分成 100 个新的数据帧、旋转并将它们保存到 csvs 中.
The goal is to have the 100 variables separated into 100 new dataframes, pivoted, and save them into csvs.
以下是转换一个变量的代码:
Below is the code for transforming one variable:
import pandas as pd
df = pd.DataFrame({
'country_a': ['aa', 'bb', 'cc'],
'country_b': ['xx', 'yy', 'zz'],
'year': [2018, 2019, 2020],
'var_a': [1, 0, 1],
'var_b': [2, 1, 2],
'var_c': [0, 1.6, 2.4]
})
print(df)
country_a country_b year var_a var_b var_c
0 aa xx 2018 1 2 0.0
1 bb yy 2019 0 1 1.6
2 cc zz 2020 1 2 2.4
然后我会做一个支点:
table=pd.pivot_table(df, values='var_a', index=['country_a','country_b'],columns=['year']).reset_index()
table.to_csv('var_a.csv')
表格将如下所示:
country_a country_b 2018 2019 2020
0 aa xx 1.0 NaN NaN
1 bb yy NaN 0.0 NaN
2 cc zz NaN NaN 1.0
我在这里问了问题的第一部分:熊猫:融化将 100 多个变量放入 100 多个新数据框现在我在为分离的 dfs 合并枢轴函数时遇到问题...
I asked the first part of the question here: pandas: melt 100+ variables into 100+ new dataframes Now I have problems incorporating the pivot function for the separated dfs...
非常感谢!
推荐答案
这里是一种重新整形原始数据框的方法(使用melt、unstack 和reset_index),然后分别导出var_a、var_b、..., 到它自己的 CSV 文件:
Here is a way to re-shape the original data frame (using melt, unstack and reset_index), followed by exporting each of var_a, var_b, ..., to its own CSV file:
df_new = (
df.melt(id_vars=['country_a', 'country_b', 'year'],
var_name='variable',
value_name='value')
.set_index(['country_a', 'country_b', 'year', 'variable'])
.sort_index()
.squeeze()
.unstack(level='year')
.fillna(0) # for display purposes
.astype(int) # also for display purposes
.reset_index(level=['country_a', 'country_b'])
)
print(df_new)
year country_a country_b 2018 2019 2020
variable
var_a aa xx 1 0 0
var_b aa xx 2 0 0
var_c aa xx 0 0 0
var_a bb yy 0 0 0
var_b bb yy 0 1 0
var_c bb yy 0 1 0
var_a cc zz 0 0 1
var_b cc zz 0 0 2
var_c cc zz 0 0 2
现在将每个变量导出到其自己的 CSV 文件:
Now export each variable to its own CSV file:
for idx in df_new.index.unique():
filename = f'{idx}.csv'
with open(filename, 'wt') as handle:
#df_new.loc[idx].to_csv(handle) # <- un-comment this line in your code
print(filename)
print(df_new.loc[idx])
print()
var_a.csv
year country_a country_b 2018 2019 2020
variable
var_a aa xx 1 0 0
var_a bb yy 0 0 0
var_a cc zz 0 0 1
var_b.csv
year country_a country_b 2018 2019 2020
variable
var_b aa xx 2 0 0
var_b bb yy 0 1 0
var_b cc zz 0 0 2
var_c.csv
year country_a country_b 2018 2019 2020
variable
var_c aa xx 0 0 0
var_c bb yy 0 1 0
var_c cc zz 0 0 2
这篇关于将数据帧分成多个新的数据帧并批量重组新的 dfs的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!