Python/Pandas:仅当每个Excel文件包含某些值时,才循环附加Excel文件 [英] Python/Pandas: Loop to append Excel files ONLY if each Excel file contains certain values
问题描述
我有以下代码:
dfs = []
for f in files_xlsx:
city_name = pd.read_excel(f, "1. City", nrows=1, parse_cols="C", header=None, skiprows=1)
country_code = pd.read_excel(f, "1. City", nrows=1, parse_cols="C", header=None, skiprows=2)
data = pd.read_excel(f, "1. City", parse_cols="B:J", header=None, skiprows=8)
data['City name'] = city_name.iat[0,0]
data['City code'] = country_code.iat[0,0]
dfs.append(data)
df = pd.concat(dfs, ignore_index=True)
仅当每个Excel文件在以下位置(列和行组合)D8, E8, F8, G8, H8, I8, J8
中包含值91, 92, 93, 94, 95, 96, 97
时,我才想运行循环.仅当所有文件都满足此条件时,循环才应运行.
I would like to run the loop if and only if each Excel file contains the values 91, 92, 93, 94, 95, 96, 97
in the following location (column and row combination) D8, E8, F8, G8, H8, I8, J8
. The loop should only run if this condition is met across all files.
我所有的Excel文件在理论上都具有相同的格式.实际上,它们通常不这样做,因此我想在附加它们之前先进行检查.如果代码可以告诉我哪个文件不满足上述条件,那就太好了.谢谢.
All my Excel files have the same format in theory. In practice, they often don't so I want to run a check before appending them. It would be great if the code could tell me which file does not meet the above condition. Thank you.
修改:
In[1]: data
Out[1]:
0 1 2 3 4 5 6 7 8 City name City code
0 x x x x x x x x x x
1 x x x x x x x x x x
2 x x x x x x x x x x
推荐答案
请考虑构建一个类似于Excel数据但其值与规范匹配的帮助器数据框(第8行和具有这些特定值的D-J列).然后在循环中迭代与此辅助数据框架合并,如果返回匹配,则有条件地附加到您的数据框架列表中.
Consider building a helper data frame that resembles Excel data but with values matched to specification (row 8 and columns D-J with those specific values). Then in loop iteratively merge against this helper data frame and if it returns a match, conditionally append to your list of data frames.
注意::将列调整为实际的列名,在DataFrame()
和merge()
调用中将list('ABCDEFGHIJ')
替换为名称列表['col1','col2','col3',...]
.
NOTE: Adjust columns to actual column names, replacing list('ABCDEFGHIJ')
with list of names, ['col1','col2','col3',...]
, in DataFrame()
and merge()
calls.
check_df = pd.concat([pd.DataFrame([[0]*9 for _ in range(7)],
columns=['Heading code','Heading name','91','92','93','94','95','96','97']),
pd.DataFrame([[0,0,91,92,93,94,95,96,97]],
columns=['Heading code','Heading name','91','92','93','94','95','96','97'])],
ignore_index=True).reset_index()
print(check_df)
# index Heading code Heading name 91 92 93 94 95 96 97
# 0 0 0 0 0 0 0 0 0 0 0
# 1 1 0 0 0 0 0 0 0 0 0
# 2 2 0 0 0 0 0 0 0 0 0
# 3 3 0 0 0 0 0 0 0 0 0
# 4 4 0 0 0 0 0 0 0 0 0
# 5 5 0 0 0 0 0 0 0 0 0
# 6 6 0 0 0 0 0 0 0 0 0
# 7 7 0 0 91 92 93 94 95 96 97
dfs = []
for f in files_xlsx:
city_name = pd.read_excel(f, "1. City", nrows=1, parse_cols="C", header=None, skiprows=1)
country_code = pd.read_excel(f, "1. City", nrows=1, parse_cols="C", header=None, skiprows=2)
data = pd.read_excel(f, "1. City", parse_cols="B:J", header=None, skiprows=8)\
.assign(city_name=city_name.iat[0,0], city_code=country_code.iat[0,0])
data.columns = ['Heading code','Heading name','91','92','93','94','95','96','97','City name','City code']
# INNER JOIN MERGE ON INDEX AND COLS, D-J
tmp = data.reset_index().merge(check_df, on=['index','91','92','93','94','95','96','97'])
# CONDITIONALLY APPEND
if len(tmp) > 0:
dfs.append(data)
df = pd.concat(dfs, ignore_index=True)
以下通过随机数据进行演示
Below demonstrates with random data
np.random.seed(82118)
# LIST OF FIVE DATAFRAMES (TO RESEMBLE EXCEL DFs)
rand_dfs = [pd.DataFrame([np.random.randint(1, 100, 9) for _ in range(10)],
columns=['Heading code','Heading name','91','92','93','94','95','96','97'])
for _ in range(5)]
# UPDATE TWO DATAFRAMES EACH WITH 10 COLS TO INCLUDE MATCHING 8TH ROW
rand_dfs[2].loc[7] = [0, 0, 91, 92, 93, 94, 95, 96, 97]
rand_dfs[4].loc[7] = [0, 0, 91, 92, 93, 94, 95, 96, 97]
final_dfs = []
for d in rand_dfs:
tmp = d.reset_index().merge(check_df, on=['index','91','92','93','94','95','96','97'])
if len(tmp) > 0:
final_dfs.append(d)
final_df = pd.concat(final_dfs, ignore_index=True)
输出(请参见具有匹配条件的第8行和第17行)
print(final_df)
# Heading code Heading name 91 92 93 94 95 96 97
# 0 53 98 67 8 86 33 65 56 62
# 1 61 9 40 14 18 9 53 30 24
# 2 89 88 80 91 91 49 8 39 84
# 3 15 99 49 92 63 96 11 95 29
# 4 13 62 82 12 34 92 54 29 47
# 5 44 18 67 61 52 71 52 25 12
# 6 56 25 52 10 82 12 59 63 15
# 7 0 0 91 92 93 94 95 96 97
# 8 51 50 27 38 34 11 57 92 3
# 9 49 99 46 87 46 5 63 24 8
# 10 31 62 8 23 19 66 60 10 66
# 11 51 98 30 44 45 39 32 74 82
# 12 88 19 54 28 38 71 3 31 34
# 13 58 13 89 17 96 35 12 52 85
# 14 93 67 13 13 28 43 24 7 4
# 15 34 26 73 20 44 37 18 17 22
# 16 59 1 99 9 11 6 4 99 95
# 17 0 0 91 92 93 94 95 96 97
# 18 88 6 23 20 35 26 37 56 51
# 19 21 67 19 63 77 98 41 9 22
这篇关于Python/Pandas:仅当每个Excel文件包含某些值时,才循环附加Excel文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!