Python/Pandas:仅当每个Excel文件包含某些值时,才循环附加Excel文件 [英] Python/Pandas: Loop to append Excel files ONLY if each Excel file contains certain values

查看:107
本文介绍了Python/Pandas:仅当每个Excel文件包含某些值时,才循环附加Excel文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下代码:

dfs = []
    for f in files_xlsx:
            city_name = pd.read_excel(f, "1. City", nrows=1, parse_cols="C", header=None, skiprows=1)
            country_code = pd.read_excel(f, "1. City", nrows=1, parse_cols="C", header=None, skiprows=2)
            data = pd.read_excel(f, "1. City", parse_cols="B:J", header=None, skiprows=8)
            data['City name'] = city_name.iat[0,0]
            data['City code'] = country_code.iat[0,0]
            dfs.append(data)

    df = pd.concat(dfs, ignore_index=True)

仅当每个Excel文件在以下位置(列和行组合)D8, E8, F8, G8, H8, I8, J8中包含值91, 92, 93, 94, 95, 96, 97时,我才想运行循环.仅当所有文件都满足此条件时,循环才应运行.

I would like to run the loop if and only if each Excel file contains the values 91, 92, 93, 94, 95, 96, 97 in the following location (column and row combination) D8, E8, F8, G8, H8, I8, J8. The loop should only run if this condition is met across all files.

我所有的Excel文件在理论上都具有相同的格式.实际上,它们通常不这样做,因此我想在附加它们之前先进行检查.如果代码可以告诉我哪个文件满足上述条件,那就太好了.谢谢.

All my Excel files have the same format in theory. In practice, they often don't so I want to run a check before appending them. It would be great if the code could tell me which file does not meet the above condition. Thank you.

修改:

In[1]: data

Out[1]:

0   1   2   3   4   5   6   7   8   City name   City code
0   x   x   x   x   x   x   x   x       x           x
1   x   x   x   x   x   x   x   x       x           x
2   x   x   x   x   x   x   x   x       x           x

推荐答案

请考虑构建一个类似于Excel数据但其值与规范匹配的帮助器数据框(第8行和具有这些特定值的D-J列).然后在循环中迭代与此辅助数据框架合并,如果返回匹配,则有条件地附加到您的数据框架列表中.

Consider building a helper data frame that resembles Excel data but with values matched to specification (row 8 and columns D-J with those specific values). Then in loop iteratively merge against this helper data frame and if it returns a match, conditionally append to your list of data frames.

注意::将列调整为实际的列名,在DataFrame()merge()调用中将list('ABCDEFGHIJ')替换为名称列表['col1','col2','col3',...].

NOTE: Adjust columns to actual column names, replacing list('ABCDEFGHIJ') with list of names, ['col1','col2','col3',...], in DataFrame() and merge() calls.

check_df = pd.concat([pd.DataFrame([[0]*9 for _ in range(7)], 
                                    columns=['Heading code','Heading name','91','92','93','94','95','96','97']),
                      pd.DataFrame([[0,0,91,92,93,94,95,96,97]], 
                                   columns=['Heading code','Heading name','91','92','93','94','95','96','97'])],
                      ignore_index=True).reset_index()

print(check_df)    
#    index  Heading code  Heading name  91  92  93  94  95  96  97
# 0      0             0             0   0   0   0   0   0   0   0
# 1      1             0             0   0   0   0   0   0   0   0
# 2      2             0             0   0   0   0   0   0   0   0
# 3      3             0             0   0   0   0   0   0   0   0
# 4      4             0             0   0   0   0   0   0   0   0
# 5      5             0             0   0   0   0   0   0   0   0
# 6      6             0             0   0   0   0   0   0   0   0
# 7      7             0             0  91  92  93  94  95  96  97

dfs = []
for f in files_xlsx:
   city_name = pd.read_excel(f, "1. City", nrows=1, parse_cols="C", header=None, skiprows=1)
   country_code = pd.read_excel(f, "1. City", nrows=1, parse_cols="C", header=None, skiprows=2)
   data = pd.read_excel(f, "1. City", parse_cols="B:J", header=None, skiprows=8)\
            .assign(city_name=city_name.iat[0,0], city_code=country_code.iat[0,0])
   data.columns = ['Heading code','Heading name','91','92','93','94','95','96','97','City name','City code']

   # INNER JOIN MERGE ON INDEX AND COLS, D-J
   tmp = data.reset_index().merge(check_df, on=['index','91','92','93','94','95','96','97'])    

   # CONDITIONALLY APPEND
   if len(tmp) > 0:
        dfs.append(data)

df = pd.concat(dfs, ignore_index=True)

以下通过随机数据进行演示

Below demonstrates with random data

np.random.seed(82118)
# LIST OF FIVE DATAFRAMES (TO RESEMBLE EXCEL DFs)
rand_dfs = [pd.DataFrame([np.random.randint(1, 100, 9) for _ in range(10)],
                    columns=['Heading code','Heading name','91','92','93','94','95','96','97'])
            for _ in range(5)]

# UPDATE TWO DATAFRAMES EACH WITH 10 COLS TO INCLUDE MATCHING 8TH ROW
rand_dfs[2].loc[7] = [0, 0, 91, 92, 93, 94, 95, 96, 97]    
rand_dfs[4].loc[7] = [0, 0, 91, 92, 93, 94, 95, 96, 97]

final_dfs = []
for d in rand_dfs:
    tmp = d.reset_index().merge(check_df, on=['index','91','92','93','94','95','96','97'])

    if len(tmp) > 0: 
        final_dfs.append(d)

final_df = pd.concat(final_dfs, ignore_index=True)

输出(请参见具有匹配条件的第8行和第17行)

print(final_df)

#     Heading code  Heading name  91  92  93  94  95  96  97
# 0             53            98  67   8  86  33  65  56  62
# 1             61             9  40  14  18   9  53  30  24
# 2             89            88  80  91  91  49   8  39  84
# 3             15            99  49  92  63  96  11  95  29
# 4             13            62  82  12  34  92  54  29  47
# 5             44            18  67  61  52  71  52  25  12
# 6             56            25  52  10  82  12  59  63  15
# 7              0             0  91  92  93  94  95  96  97
# 8             51            50  27  38  34  11  57  92   3
# 9             49            99  46  87  46   5  63  24   8
# 10            31            62   8  23  19  66  60  10  66
# 11            51            98  30  44  45  39  32  74  82
# 12            88            19  54  28  38  71   3  31  34
# 13            58            13  89  17  96  35  12  52  85
# 14            93            67  13  13  28  43  24   7   4
# 15            34            26  73  20  44  37  18  17  22
# 16            59             1  99   9  11   6   4  99  95
# 17             0             0  91  92  93  94  95  96  97
# 18            88             6  23  20  35  26  37  56  51
# 19            21            67  19  63  77  98  41   9  22

这篇关于Python/Pandas:仅当每个Excel文件包含某些值时,才循环附加Excel文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆