pandas 通过一列将CSV拆分为多个CSV(或DataFrame) [英] Pandas split CSV into multiple CSV's (or DataFrames) by a column
本文介绍了 pandas 通过一列将CSV拆分为多个CSV(或DataFrame)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我非常迷惑于一个问题,我们将不胜感激.
I'm very lost with a problem and some help or tips will be appreciated.
问题:我有一个csv文件,其中包含一列,并且可能有多个值,例如:
The problem: I've a csv file with a column with the possibility of multiple values like:
Fruit;Color;The_evil_column
Apple;Red;something1
Apple;Green;something1
Orange;Orange;something1
Orange;Green;something2
Apple;Red;something2
Apple;Red;something3
我已将数据加载到一个数据帧中,我需要根据"The_evil_column"列的值将该数据帧拆分为多个数据帧:
I've loaded the data into a dataframe and i need to split that dataframe into multiple dataframes based on the value of the column "The_evil_column":
df1
Fruit;Color;The_evil_column
Apple;Red;something1
Apple;Green;something1
Orange;Orange;something1
df2
Fruit;Color;The_evil_column
Orange;Green;something2
Apple;Red;something2
df3
Fruit;Color;The_evil_column
Apple;Red;something3
在阅读了一些帖子后,我感到更加困惑,请给我一些提示.
After reading some posts i'm even more confused and i need some tip about this please.
推荐答案
您可以生成DataFrames字典:
you can generate a dictionary of DataFrames:
d = {g:x for g,x in df.groupby('The_evil_column')}
In [95]: d.keys()
Out[95]: dict_keys(['something1', 'something2', 'something3'])
In [96]: d['something1']
Out[96]:
Fruit Color The_evil_column
0 Apple Red something1
1 Apple Green something1
2 Orange Orange something1
或DataFrames列表:
or a list of DataFrames:
In [103]: l = [x for _,x in df.groupby('The_evil_column')]
In [104]: l[0]
Out[104]:
Fruit Color The_evil_column
0 Apple Red something1
1 Apple Green something1
2 Orange Orange something1
In [105]: l[1]
Out[105]:
Fruit Color The_evil_column
3 Orange Green something2
4 Apple Red something2
In [106]: l[2]
Out[106]:
Fruit Color The_evil_column
5 Apple Red something3
更新:
In [111]: g = pd.read_csv(filename, sep=';').groupby('The_evil_column')
In [112]: g.ngroups # number of unique values in the `The_evil_column` column
Out[112]: 3
In [113]: g.apply(lambda x: x.to_csv(r'c:\temp\{}.csv'.format(x.name)))
Out[113]:
Empty DataFrame
Columns: []
Index: []
将产生3个文件:
In [115]: glob.glob(r'c:\temp\something*.csv')
Out[115]:
['c:\\temp\\something1.csv',
'c:\\temp\\something2.csv',
'c:\\temp\\something3.csv']
这篇关于 pandas 通过一列将CSV拆分为多个CSV(或DataFrame)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文