Python:根据第一列的第一个字符拆分CSV文件 [英] Python: Split CSV file according to first character of the first column

查看:1361
本文介绍了Python:根据第一列的第一个字符拆分CSV文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一系列大型CSV文件basename.csv,如:



B1,3,5,6



B2,2,1,5



B3,1,9,0



C1 ,4,7,9



C2,1,9,3



C3,8,5,2



我想将它们拆分成不同的文件,如:



basename_B.csv



B1,3,5,6



B2,2,1,5



B3,1,9,0



basename_C.csv



C1,4,7,9



C2,1,9,3



C3,8,5,2



我已经做过类似的事情在过去与for循环和ifs,但我想知道是否有一个更有效的方式这样做与熊猫或任何。



解决方案



根据@chthonicdaemon和@jezrael的解决方案,我想出了:

  def split_csv():
用于glob.glob('*。csv')中的dfile:
df = pd.read_csv(dfile,header = None)
用于df.groupby中的字母,组(df [0] .str [0]):
group.to_csv(os.path.splitext )[0])+'_ {}。csv'.format(letter),index = False,header = False)

split_csv()
/ pre>

解决方案

这是一个简单的应用程序 groupby

  df = pandas.read_csv('basename.csv',header = None)

def firstletter :
firstentry = df.ix [index,0]
return firstentry [0]

用于df.groupby(firstletter)中的字母,组:
组。 to_csv('basename _ {}。csv'.format(letter))

通过列的显式内容进行分组:

 用于df.groupby(df [0] .str [ 0]):
group.to_csv('basename _ {}。csv'.format(letter))


I have a series of large CSV files "basename.csv" like:

B1,3,5,6

B2,2,1,5

B3,1,9,0

C1,4,7,9

C2,1,9,3

C3,8,5,2

I would like to split them into different files like:

basename_B.csv

B1,3,5,6

B2,2,1,5

B3,1,9,0

basename_C.csv

C1,4,7,9

C2,1,9,3

C3,8,5,2

I have already done similar things in the past with for loops and ifs, but I was wondering if there is a more efficient way of doing this with Pandas or whatever.

SOLUTION

Adapting the solution from @chthonicdaemon and @jezrael, I came up with this:

def split_csv():
    for dfile in glob.glob('*.csv'):
        df = pd.read_csv(dfile, header=None)
        for letter, group in df.groupby(df[0].str[0]):
            group.to_csv((os.path.splitext(dfile)[0]) + '_{}.csv'.format(letter), index=False, header=False)

split_csv()

解决方案

Here's a simple application of groupby:

df = pandas.read_csv('basename.csv', header=None)

def firstletter(index):
    firstentry = df.ix[index, 0]
    return firstentry[0]

for letter, group in df.groupby(firstletter):
    group.to_csv('basename_{}.csv'.format(letter))

Or, incorporating @jezrael's use of grouping by the explicit contents of the columns:

for letter, group in df.groupby(df[0].str[0]):
    group.to_csv('basename_{}.csv'.format(letter))

这篇关于Python:根据第一列的第一个字符拆分CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆