如何按唯一组拆分数据帧并将其保存到csv [英] How to split a dataframe by unique groups and save to a csv

查看:43
本文介绍了如何按唯一组拆分数据帧并将其保存到csv的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个想重复的熊猫数据框.我的数据框的简化示例:

I have a pandas dataframe I would like to iterate over. A simplified example of my dataframe:

chr    start    end    Gene    Value   MoreData
chr1    123    123    HAPPY    41.1    3.4
chr1    125    129    HAPPY    45.9    4.5
chr1    140    145    HAPPY    39.3   4.1
chr1    342    355    SAD    34.2    9.0
chr1    360    361    SAD    44.3    8.1
chr1    390    399    SAD    29.0   7.2
chr1    400    411    SAD    35.6   6.5
chr1    462    470    LEG    20.0    2.7

我想遍历每个独特的基因并创建一个名为:

I would like to iterate over each unique gene and create a new file named:

for Gene in df: ## this is where I need the most help

    OutFileName = Gene+".pdf"

对于上面的示例,我应该使用3个输出文件和3个数据帧进行3次迭代:

For the above example I should get three iterations with 3 outfiles and 3 dataframes:

# HAPPY.pdf
chr1    123    123    HAPPY    41.1    3.4 
chr1    125    129    HAPPY    45.9    4.5 
chr1    140    145    HAPPY    39.3   4.1

# SAD.pdf
chr1    342    355    SAD    34.2    9.0 
chr1    360    361    SAD  44.3    8.1 
chr1    390    399    SAD    29.0   7.2 
chr1    400    411    SAD    35.6   6.5

# Leg.pdf
chr1    462    470    LEG    20.0    2.7

按块分割的结果数据帧内容将被发送到另一个函数,该函数将执行分析并返回要写入文件的内容.

The resulting data frame contents split up by chunks will be sent to another function that will perform the analysis and return the contents to be written to file.

推荐答案

您可以获得调用 unique 的唯一值,对其进行迭代,构建文件名并将其写到csv中:

You can obtain the unique values calling unique, iterate over this, build the filename and write this out to csv:

genes = df['Gene'].unique()
for gene in genes:
    outfilename = gene + '.pdf'
    print(outfilename)
    df[df['Gene'] == gene].to_csv(outfilename)
HAPPY.pdf
SAD.pdf
LEG.pdf

另一种pandas-thonic方法是对'Gene'进行分组,然后遍历各组:

A more pandas-thonic method is to groupby on 'Gene' and then iterate over the groups:

gp = df.groupby('Gene')
# groups() returns a dict with 'Gene':indices as k:v pair
for g in gp.groups.items():
    print(df.loc[g[1]])   
    
    chr  start  end   Gene  Value  MoreData
0  chr1    123  123  HAPPY   41.1       3.4
1  chr1    125  129  HAPPY   45.9       4.5
2  chr1    140  145  HAPPY   39.3       4.1
    chr  start  end Gene  Value  MoreData
3  chr1    342  355  SAD   34.2       9.0
4  chr1    360  361  SAD   44.3       8.1
5  chr1    390  399  SAD   29.0       7.2
6  chr1    400  411  SAD   35.6       6.5
    chr  start  end Gene  Value  MoreData
7  chr1    462  470  LEG     20       2.7

这篇关于如何按唯一组拆分数据帧并将其保存到csv的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆