将多个csv文件加载到Dataframe:列名称问题 [英] load multiple csv files into Dataframe: columns names issue

查看:233
本文介绍了将多个csv文件加载到Dataframe:列名称问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有多个具有相同格式的csv文件(14行4列). 我试图将所有这些加载到单个dataFrame中,并使用文件名重命名第一列的值(1-14)

I have multiple csv files with the same format (14 rows 4 columns). I tried to load all of them into a single dataFrame, and use file's name to rename the values of the first column (1-14)

    1   500 0   0
    2   350 0   1
    3   500 1   0
    .............
    13  600 0   0
    14  800 0   0

我尝试了以下代码,但没有得到期望的结果:

I tried the following code but I am not getting what I am expecting:

    filenames = os.listdir('Threshold/')
    Y = pd.DataFrame () #empty df
    # file name are in the following foramt "subx_ICA_thre.csv"
    # need to get x (subject number to be used later for renaming columns values)
    Sub_list=[]
    for filename in filenames:
    s= int(''.join(filter(str.isdigit, filename)))
    Sub_list.append(int(s))
    S_Sub_list= sorted(Sub_list) 

    for x in S_Sub_list: # get the file according to the subject number
    temp = pd.read_csv('sub' +str(x)+'_ICA_thre.csv' )
    df = pd.concat([Y, temp])  # concat the obtained frame with the empty frame
    df.columns = ['id', 'data', 'isEB', 'isEM']
    #  replace the column values using subject id
         for sub in range(1,15):
           df['id'].replace(sub, 'sub' +str(x)+'_ICA_'+str(sub) ,inplace=True)
    print (df)

输出:

                id  data  isEB  isEM
   0    sub1_ICA_2   200     0     0
   1    sub1_ICA_3   275     0     0
   2    sub1_ICA_4   500     1     0
   ................................
   11  sub1_ICA_13   275     0     0
   12  sub1_ICA_14   300     0     0
                id  data  isEB  isEM
   0    sub2_ICA_2   275     0     0
   1    sub2_ICA_3   500     0     0
   2    sub2_ICA_4   400     0     0
   .................................
   11  sub2_ICA_13   300     0     0
   12  sub2_ICA_14   450     0     0      

首先,似乎代码使不同的dataFrame不是一个单独的.其次,第一行被删除(sub1_ICA_1丢失,可以用列名替换). 我在使用的循环中找不到问题

First, it seems that the code makes different dataFrame not a single one.Second, the first row is removed (sub1_ICA_1 is missing, may be replaced with column names). I couldn't find the problem in the loop that I am using

推荐答案

我认为您需要先创建DataFrame的列表,然后再创建 concat 稳定/生成/pandas.concat.html,然后修改列id并最后删除MultiIndex通过 reset_index :

I think you need create list of DataFrames first, then concat with parameter keys for new values by range in MultiIndex, then modify column id and last remove MultiIndex by reset_index:

还为自定义列名称添加了参数名称到read_csv.

Also was added parameter names to read_csv for custom columns names.

Y = []
for x in S_Sub_list: 
    n = ['id', 'data', 'isEB', 'isEM']
    temp = pd.read_csv('sub' + str(x) +'_ICA_thre.csv', names = n)
    Y.append(temp)

#list comprehension alternative
#n = ['id', 'data', 'isEB', 'isEM']
#Y = [pd.read_csv('sub' + str(x) +'_ICA_thre.csv', names = n) for x in S_Sub_list]

df = pd.concat(Y, keys=range(1,len(S_Sub_list) + 1))

df['id'] = 'sub' + df.index.get_level_values(0).astype(str) +'_ICA_'+ df['id'].astype(str)
df = df.reset_index(drop=True)

这篇关于将多个csv文件加载到Dataframe:列名称问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆