Python:根据第一列的第一个字符拆分CSV文件 [英] Python: Split CSV file according to first character of the first column
问题描述
我有一系列大型CSV文件basename.csv,如:
B1,3,5,6
B2,2,1,5
B3,1,9,0
C1 ,4,7,9
C2,1,9,3
C3,8,5,2
我想将它们拆分成不同的文件,如:
basename_B.csv
B1,3,5,6
B2,2,1,5
B3,1,9,0
basename_C.csv
C1,4,7,9
C2,1,9,3
C3,8,5,2
我已经做过类似的事情在过去与for循环和ifs,但我想知道是否有一个更有效的方式这样做与熊猫或任何。
解决方案
根据@chthonicdaemon和@jezrael的解决方案,我想出了:
def split_csv():
/ pre>
用于glob.glob('*。csv')中的dfile:
df = pd.read_csv(dfile,header = None)
用于df.groupby中的字母,组(df [0] .str [0]):
group.to_csv(os.path.splitext )[0])+'_ {}。csv'.format(letter),index = False,header = False)
split_csv()
解决方案这是一个简单的应用程序
groupby
:df = pandas.read_csv('basename.csv',header = None)
def firstletter :
firstentry = df.ix [index,0]
return firstentry [0]
用于df.groupby(firstletter)中的字母,组:
组。 to_csv('basename _ {}。csv'.format(letter))
通过列的显式内容进行分组:
用于df.groupby(df [0] .str [ 0]):
group.to_csv('basename _ {}。csv'.format(letter))
I have a series of large CSV files "basename.csv" like:
B1,3,5,6
B2,2,1,5
B3,1,9,0
C1,4,7,9
C2,1,9,3
C3,8,5,2
I would like to split them into different files like:
basename_B.csv
B1,3,5,6
B2,2,1,5
B3,1,9,0
basename_C.csv
C1,4,7,9
C2,1,9,3
C3,8,5,2
I have already done similar things in the past with for loops and ifs, but I was wondering if there is a more efficient way of doing this with Pandas or whatever.
SOLUTION
Adapting the solution from @chthonicdaemon and @jezrael, I came up with this:
def split_csv(): for dfile in glob.glob('*.csv'): df = pd.read_csv(dfile, header=None) for letter, group in df.groupby(df[0].str[0]): group.to_csv((os.path.splitext(dfile)[0]) + '_{}.csv'.format(letter), index=False, header=False) split_csv()
解决方案Here's a simple application of
groupby
:df = pandas.read_csv('basename.csv', header=None) def firstletter(index): firstentry = df.ix[index, 0] return firstentry[0] for letter, group in df.groupby(firstletter): group.to_csv('basename_{}.csv'.format(letter))
Or, incorporating @jezrael's use of grouping by the explicit contents of the columns:
for letter, group in df.groupby(df[0].str[0]): group.to_csv('basename_{}.csv'.format(letter))
这篇关于Python:根据第一列的第一个字符拆分CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!