根据列值输出多个文件python pandas [英] output multiple files based on column value python pandas
问题描述
我有一个示例熊猫数据框:
i have a sample pandas data frame:
import pandas as pd
df = {'ID': [73, 68,1,94,42,22, 28,70,47, 46,17, 19, 56, 33 ],
'CloneID': [1, 1, 1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4 ],
'VGene': ['64D', '64D', '64D', 61, 61, 61, 311, 311, 311, 311, 311, 311, 311, 311]}
df = pd.DataFrame(df)
它看起来像这样:
df
Out[7]:
CloneID ID VGene
0 1 73 64D
1 1 68 64D
2 1 1 64D
3 1 94 61
4 1 42 61
5 2 22 61
6 2 28 311
7 3 70 311
8 3 47 311
9 3 46 311
10 4 17 311
11 4 19 311
12 4 56 311
13 4 33 311
我想编写一个简单的脚本,将每个cloneID输出到不同的输出文件.因此,在这种情况下,将有4个不同的文件. 第一个文件将被命名为'CloneID1.txt',看起来像这样:
i want to write a simple script to output each cloneID to a different output file. so in this case there would be 4 different files. the first file would be named 'CloneID1.txt' and it would look like this:
CloneID ID VGene
1 73 64D
1 68 64D
1 1 64D
1 94 61
1 42 61
第二个文件将被命名为"CloneID2.txt":
second file would be named 'CloneID2.txt':
CloneID ID VGene
2 22 61
2 28 311
第三个文件将被命名为"CloneID3.txt":
third file would be named 'CloneID3.txt':
CloneID ID VGene
3 70 311
3 47 311
3 46 311
最后一个文件为"CloneID4.txt":
and last file would be 'CloneID4.txt':
CloneID ID VGene
4 17 311
4 19 311
4 56 311
4 33 311
我在网上找到的代码是:
the code i found online was:
import pandas as pd
data = pd.read_excel('data.xlsx')
for group_name, data in data.groupby('CloneID'):
with open('results.csv', 'a') as f:
data.to_csv(f)
但是它将所有内容输出到一个文件而不是多个文件.
but it outputs everything to one file instead of multiple files.
推荐答案
您可以执行以下操作:
In [19]:
gp = df.groupby('CloneID')
for g in gp.groups:
print('CloneID' + str(g) + '.txt')
print(gp.get_group(g).to_csv())
CloneID1.txt
,CloneID,ID,VGene
0,1,73,64D
1,1,68,64D
2,1,1,64D
3,1,94,61
4,1,42,61
CloneID2.txt
,CloneID,ID,VGene
5,2,22,61
6,2,28,311
CloneID3.txt
,CloneID,ID,VGene
7,3,70,311
8,3,47,311
9,3,46,311
CloneID4.txt
,CloneID,ID,VGene
10,4,17,311
11,4,19,311
12,4,56,311
13,4,33,311
因此,这里我们遍历for g in gp.groups:
中的组,并使用它来创建结果文件路径名并在该组上调用to_csv
,以便以下内容对您有用:
So here we iterate over the groups in for g in gp.groups:
and we use this to create the result file path name and call to_csv
on the group so the following should work for you:
gp = df.groupby('CloneID')
for g in gp.groups:
path = 'CloneID' + str(g) + '.txt'
gp.get_group(g).to_csv(path)
实际上,以下内容甚至会更简单:
Actually the following would be even simpler:
gp = df.groupby('CloneID')
gp.apply(lambda x: x.to_csv('CloneID' + str(x.name) + '.txt'))
这篇关于根据列值输出多个文件python pandas的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!