遍历数据框列表以删除特定行 Pandas [英] Iterate through a list of dataframes to drop particular rows Pandas
问题描述
在我之前的问题 我要求在 Pandas 中删除特定行
在帮助下,我删除了 1980 年之前的行.季节"列(包含年份)采用以下格式:
With help, I was to drop rows that before 1980. The 'Season' column (that had the years) were in this format:
2018-19
2017-18
This
list would go
till 1960
在之前的问题(链接)中,@jezrael 提供了一个解决方案,帮助我在 1980 年之前删除了行.
In the earlier question(linked) @jezrael gave a solution that helped me drop rows before 1980.
我有一个包含 30 个数据帧的列表(称为 list
).我想遍历 30 个数据帧,并为每个 df
删除 1980 年之前的所有行.例如,list
中的一项是 BOS
如果 BOS['Season]
有:
I have a list (called list
) that has 30 dataframes. I want to iterate through the 30 dataframes and drop all rows before 1980 for every df
. For instance, one of the items in list
is BOS
if BOS['Season]
has:
2018-19
2017-18
1959-1960
我应该得到
2018-19
2017-18
对于 list
这是我尝试过的,但出现错误或什么也不会发生:
This is what I tried, but got errors or nothing would happen:
for df in list:
df = df[df['Season'].str.split('-').str[0].astype(int) > 1980]
我的代码有什么问题?我是python的新手.我认为通过将 df
分配给更改,它将实现它到 list
What's wrong with my code? I am new to python. I thought by assigning df
to the the change, it would implement it to every 'df
in the list
谢谢!
更新:我有一个名为 league
的 list
.这个列表有 30 个数据帧.我查看了 jazrael 和 IMCoin 的解决方案.他们俩都工作了.但这是我的要求.
UPDATE:
I have a list
named as league
. This list has 30 DataFrames. I looked at jazrael's and IMCoin's solution. Both of them worked. But here is my requirement.
在为每个 DataFrame 删除 1980 年之前的行之后.我希望能够直接使用该 DataFrame,而不是通过列表.这就是我的意思.
After dropping rows before 1980 for every DataFrame. I want to be able to use that DataFrame directly, and not through the list. Here is what I mean.
#Before Appending to the list
BOS = pd.read_csv(dir+"Boston_Sheet")
# I have 30 different cities, each having a CSV file and making each city have
# their own DataFrame. So Boston as `BOS`, Chicago as `CHI` and like that 30 cities.
所有这 30 个城市数据帧都已添加到列表 league
中.将 city DataFrame 过滤到上述条件后,我希望能够使用过滤后的数据调用 BOS
或 CHI
.这只是为了方便我开发其他功能.
All of those 30 city DataFrames have already been appended to the list league
.
After filtering the city DataFrame to the conditions above, I want to be able to call BOS
or CHI
with the filtered data. This is just so that it will be easy for me developing other functions down the line.
推荐答案
您需要创建已过滤的 DataFrame 的新列表或重新分配旧的列表:
You need create new list of filtered DataFrames or reaasign old one:
注意:不要使用变量list
,因为builtins
(python代码字).
Notice: Dont use variable list
, because builtins
(python code word).
L = [df[df['Season'].str.split('-').str[0].astype(int) > 1980] for df in L]
循环版本:
output = []
for df in L:
df = df[df['Season'].str.split('-').str[0].astype(int) > 1980]
output.append(df)
如果只需要提取长度为 4 的第一个整数:
If need extract only first integers with length 4:
L = [df, df]
L = [df[df['Season'].str.extract('(\d{4})', expand=False).astype(float) > 1980]
for df in L]
print (L)
[ Season
0 2018-19
1 2017-18, Season
0 2018-19
1 2017-18]
如果数据具有相同的结构,我建议创建一个带有新列的大 DataFrame 以区分城市:
If data have same structure I suggest create one big DataFrame with new column for distinguish cities:
import glob
files = glob.glob('files/*.csv')
dfs = [pd.read_csv(fp).assign(City=os.path.basename(fp).split('.')[0]) for fp in files]
df = pd.concat(dfs, ignore_index=True)
print (df)
Season City
0 2018-19 Boston_Sheet
1 This Boston_Sheet
2 list would go Boston_Sheet
3 till 1960 Boston_Sheet
4 2018-19 Chicago_Sheet
5 2017-18 Chicago_Sheet
6 This Chicago_Sheet
df1 = df[df['Season'].str.extract('(\d{4})', expand=False).astype(float) > 1980]
print (df1)
Season City
0 2018-19 Boston_Sheet
4 2018-19 Chicago_Sheet
5 2017-18 Chicago_Sheet
df2 = df1[df1['City'] == 'Boston_Sheet']
print (df2)
Season City
0 2018-19 Boston_Sheet
df3 = df1[df1['City'] == 'Chicago_Sheet']
print (df3)
Season City
4 2018-19 Chicago_Sheet
5 2017-18 Chicago_Sheet
<小时>
如果需要将每个DataFrame分开,可以通过DataFrame的字典:
If need each DataFrame separate, it is possible by dictionary of DataFrames:
import glob
files = glob.glob('files/*.csv')
dfs_dict = {os.path.basename(fp).split('.')[0] : pd.read_csv(fp) for fp in files}
print (dfs_dict)
print (dfs_dict['Boston_Sheet'])
Season
0 2018-19
1 This
2 list would go
3 till 1960
print (dfs_dict['Chicago_Sheet'])
0 2018-19
1 2017-18
2 This
然后在字典理解中处理:
Then processing in dictionary comprehension:
dfs_dict = {k:v[v['Season'].str.extract('(\d{4})', expand=False).astype(float) > 1980]
for k, v in dfs_dict.items()}
print (dfs_dict)
{'Boston_Sheet': Season
0 2018-19, 'Chicago_Sheet': Season
0 2018-19
1 2017-18}
print (dfs_dict['Boston_Sheet'])
Season
0 2018-19
print (dfs_dict['Chicago_Sheet'])
Season
0 2018-19
1 2017-18
这篇关于遍历数据框列表以删除特定行 Pandas的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!